Closed Bug 41489 Opened 24 years ago Closed 3 years ago

HTTP authentication does not support non-ISO-8859-1 characters

Categories

(Core :: Networking: HTTP, defect, P5)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- -

People

(Reporter: morse, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: dev-doc-needed, intl, Whiteboard: [necko-triaged])

Attachments

(5 files, 18 obsolete files)

290 bytes, application/x-php
Details
22.95 KB, patch
Details | Diff | Splinter Review
8.21 KB, patch
Details | Diff | Splinter Review
5.54 KB, patch
Details | Diff | Splinter Review
4.43 KB, patch
Details | Diff | Splinter Review
I was just looking over the code for doing http authentication and I 
happened to notice that the username and password are being kept in a char*.  
That won't work with japanese characters for example.  You need to either work 
exclusively with PRUnichar* or you need to have at some point done a converstion  
from PRUnichar to UTF8 encoding.

I picked this up inadvertently in a code reading so I attempted to see what 
would happen if I used japanese characters for http authentication.  I tried it 
and got an assertion failure at nsBasicAuth.cpp line 60.  Comment preceding that 
line says "we work with ASCII around these here parts."

You can continue to work with ascii if you do the utf8 conversion first.  Take a 
look at wallet.cpp and see how I handle double-byte characters and the UTF8 
conversion.

BTW, you'll need to be able to enter and display a japanese character set in 
order to demonstrate this problem.  I questioned the i18n team a long time ago 
about that and was given instructions (see bug 23037) for solving these two 
problems.  Basically you first need to do the following simple steps:

1. Get a japanese font from http://jazz/users/teruko/publish/fonts/ie3lpkja.exe 
and install it.  Need to reboot system after installation.

2. Modify browser prefs to use that font (set variable-width japanese encoding 
to use MS Gothic).  Now you can display japanese characters.

3. In order to enter japanese characters without a japanese keyboard, do a 
copy-and-paste from http://babel/testdata/double_byte/Selected_data_sjis.htm.
Target Milestone: --- → Future
I know what "future" means -- it means we aren't going to do it.  Is i18n 
comfortable with the fact that a japanese user will never be able to enter his 
japanese username in an http authentication box?  Is he able to do so now in 
4.x?

It's not that difficult to continue working with ascii and do a utf-8 conversion 
first.  I'll be glad to work with you if you don't know how to do it.
Steve-- the bugs are not being marked future becuz we don't know how to do them. 
We are prioritizing. We have much more cruel bugs to look at before getting 
around to these. If you want to spare your free time on this-- go ahead.
Assignee: gagan → morse
Haven't heard any comments from the i18n team about this so I guess they don't 
consider this important after all.  Therefore I'll leave it futured and give it 
back to gagan.
Assignee: morse → gagan
Steve,
Thanks for the pro-active efforts on this.
I don't think non-ASCII username/passwords for HTTP authentication is a
high priority for PR3 or maybe even 6.0.  But does this gracefully degrade?
You mention you got an assertion.  In a non-debug build, what will happen?
I don't recall what happened after I got the assertion, it was a while ago that 
I tested this.
Teruko, Can we have someone try this with Japanese characters and add a
comment to this bug about what happens?
This is probably a more serious concern with the realm string.  The realm
will more than likely be sent in a localized form.  However, RFC 2616 states
that the realm string must be of type quoted-string, which means that it
can include any TEXT (including non-ascii characters) except for " and
certain control characters.  If the TEXT contains characters from character
sets other than ISO-8859-1, then the TEXT must be encoded according to the
rules of RFC 2047.  I haven't had a chance to read RFC 2047, so I don't know
exactly what that means. 
Blocks: 61681
->me
Assignee: gagan → darin
Component: Networking → Networking: HTTP
Target Milestone: Future → mozilla1.0
-> upping priority
Severity: normal → critical
Status: NEW → ASSIGNED
Keywords: mozilla1.0
Priority: P3 → P1
Target Milestone: mozilla1.0 → mozilla0.9.9
*** Bug 88432 has been marked as a duplicate of this bug. ***
do we have existing code for handling text strings encoded according to RFC 2047?
nsHttpBasicAuth.cpp still has this problem... here's the problem code:

    // we work with ASCII around here
    nsCAutoString userpass;
    userpass.AssignWithConversion(username);
    if (password) {
        userpass.Append(':');
        userpass.AppendWithConversion(password);
    }

according to RFC2617 section 2, the username and password need to be of type
TEXT before they are base64 encoded, and according to RFC2616 section 2.2:

  Words of *TEXT MAY contain characters from character sets other than
  ISO-8859-1 only when encoded according to the rules of RFC2047.

so, for properly interpreting the realm string, we need to decode using RFC2047
and for properly constructing a Authorization request header, we need to encode
using RFC2047.

so, it comes down to this: is there any existing support for RFC2047 within mozilla?
> so, it comes down to this: is there any existing support for RFC2047
> within mozilla?

Yes, because we encode mail headers according to rfc2047.
It's probably in source/mailnews/mime/src/

Naoki, can you help?
oh.. this is bad.  perhaps we need to move some of this mime stuff into a more
generic place.  necko cannot depend on mailnews being installed for proper intl
support.  suggestions?
>perhaps we need to move some of this mime stuff into a more generic place
Cc to mail engineers
can someone on the mail team comment on this bug?  i'm wondering if there is a
way to leverage some of the existing mime encoding/decoding facilities for HTTP.
 we need to handle RFC2047 encoding, but necko cannot depend on mailnews.

any suggestions on how best to handle this?
Severity: critical → major
Priority: P1 → P2
To encode, we need to specify a charset name, is that available in the code?
Also, need to convert the string from UTF-8 to that charset, so need uconv. Does
necko currently depends on uconv?
Taka is working on MIME encoder, cc to him.
Do we really want to do this?  I see lots of implications here which are
not easy to get answers.

For instance, RFC-2617 "3.2.2 The Authorization Request Header" has
following BNF:

       credentials      = "Digest" digest-response
       digest-response  = 1#( username | realm | nonce | digest-uri
                       | response | [ algorithm ] | [cnonce] |
                       [opaque] | [message-qop] |
                           [nonce-count]  | [auth-param] )

       username         = "username" "=" username-value
       username-value   = quoted-string

And, RFC-2047 says:

       + An 'encoded-word' MUST NOT appear within a 'quoted-string'.

I don't know how we can get over this conflict.
RFC2617 depends on the definitions from RFC2616, which specifies that
quoted-string can contain RFC2047 encoded words.

nhotta: necko currently depends on uconv.
-> future
Target Milestone: mozilla0.9.9 → Future
not critical for mozilla 1.0
Keywords: mozilla1.0
[disclaimer: I am unfamiliar with Digest authentication. I just want Basic to work.]

Is the suggestion that user/password values using non-ASCII characters should be
encoded something like:

  =?shift_jis?q?=93=fa?=:=?shift_jis?q?=96=7b=

Is there any server-side software at all that will understand this? It seems a
break from established practice even if it is to the letter of the standard.

Using an encoding intended for mail-header parts where there are many
out-of-bounds characters seems non-sensible for something that's about to be
base64-encoded, ie. no out-of-bounds characters. It's especially silly if the
coding is 'b' - as recommended for primarily non-US-ASCII input - as you then
end up base64-encoding everything twice!

It seems Opera6 just submits basic auth values as UTF-8. This seems fairly
reasonable to me. (maybe any charset given in the Content-Type header of the
response including the WWW-Authenticate header could be used as a hint,
otherwise?) On the other hand, IE6 shares Moz's current inability to submit
non-US-ASCII characters in auth.
yeah, i don't know why the spec doesn't say to use base64 encoded UTF-8.  that
would be fairly sensible, but there is a defined standard.  opera's
implementation sounds non-conformant, which is generally a bad thing.  internet
standards exist for a reason, right? ;-)
/me grudgingly accepts the Internet Standards Existing for a Reason argument.
(Although secretly I'd be quite happy if the Mozilla team said "Hey, look over
there! A dinosaur!" and hacked UTF-8 in whilst no-one was paying attention.)

PRO: as well as allowing non-Latin-1 characters to be used in
usernames/passwords, encoding them could finally allow colons to be used.
Currently all browsers submit an entered username of 'a:b' with password of
'c:d' as 'a:b:c:d', leaving the unlucky server to work out what was meant.

CON: as far as I know, no server software will support this. Certainly a quick
flick through the Apache 1.3 auth code and modules reveals no such handling.

Basic Authentication really isn't very well designed is it?
*** Bug 264044 has been marked as a duplicate of this bug. ***
*** Bug 263576 has been marked as a duplicate of this bug. ***
There was a comment in bug 263576 about IE doing this a bit better than us, I
think it mean in displaying the realm.
QA Contact: tever → benc
*** Bug 340438 has been marked as a duplicate of this bug. ***
Bug 340438 contains headers sent by Firefox and IE when attempting to login with user name "卓" (U+5353).

We send Authorization: Basic UzpwYXNzd29yZA==
which decodes as S:password

IE sends Authorization: Basic 1786cGFzc3dvcmQ=
which decodes as <0xd7bf>:password. 0xd7bf is 卓 in cp936
(In reply to comment #30)

> We send Authorization: Basic UzpwYXNzd29yZA==
> which decodes as S:password

So, we're just truncating 0x5353 (=U+5353 in UTF-16) to 0x53 as expected.
 
> IE sends Authorization: Basic 1786cGFzc3dvcmQ=
> which decodes as <0xd7bf>:password. 0xd7bf is 卓 in cp936

IE uses perhaps the defautl character encoding ... while Opera uses UTF-8. 

And, RFC 2616 stipulates that RFC 2047 be used..  This is a total mess. 
How about major web servers? I wonder whether Apache 2.x has anything better than Apache 1.x

BTW, bug 295084 is about moving RFC 2047/2231 encoding routines from mail to necko.

Depends on: 295084
Assignee: darin → nobody
Status: ASSIGNED → NEW
QA Contact: benc → networking.http
OS: Windows NT → All
Priority: P2 → --
Hardware: PC → All
Summary: http authentication will not support double-byte characters → http authentication will not support non-ASCII characters
Target Milestone: Future → ---
*** Bug 337130 has been marked as a duplicate of this bug. ***
Summary: http authentication will not support non-ASCII characters → http authentication does not support non-ASCII characters
*** Bug 352953 has been marked as a duplicate of this bug. ***
The IETF is spinning some wheels on RFC2617 internationalization. I'll watch them and fix this when they have a solution.
Assignee: nobody → sayrer
Keywords: intl
OK, according to spec, it's clear that non-encoded fields like realm should be rfc2047-encoded, but it's not clear which character set they should be encoded from. The same encoding conundrum applies to hexdigest- and base64-encoded fields.
Ick.

The WWW-Authenticate header is sent in response to a request, so the server may have client cues (Accept-Charset) to go on in formatting this header per RFC2047.  We (the browser) shouldn't need to care what's in the realm, except when we need to display it to the user, but even then, things should "work" even if we don't understand how to display it.  (Is the realm "foo" equivalent to the realm "=?US-ASCII?Q?foo?="?)

Andrew's comment 23 would seem to be the proper approach, according to RFC 2617.  This seems very inelegant, though, and I agree that nothing anywhere is going to support this any time soon, and in the mean time, lots of stuff will break.  I would advocate a change to the standards, moving to UTF-8 in base64-encoded fields.  The real constraint is the need for the headers to remain US-ASCII, and by encoding with base64, you've already accomplished that.
(In reply to comment #36)
> Ick.
> 
> The WWW-Authenticate header is sent in response to a request, so the server may
> have client cues (Accept-Charset) to go on in formatting this header per
> RFC2047.  We (the browser) shouldn't need to care what's in the realm, except
> when we need to display it to the user, but even then, things should "work"...

There is a general problem here, regarding HTTP authentication headers, but you have made conclusions that apply only to Basic authentication. For example, the Realm is part of the "response" field in Digest. That field is hex-encoded, so there are no immediate problems with it, but it's not clear how to encode the given fields in order to perform the Digest calculations. Authentication needs to work if the username is in Mongolian, the password is in Japanese, and the Realm contains a Euro symbols.

> 
> Andrew's comment 23 would seem to be the proper approach, according to RFC
> 2617.  This seems very inelegant, though, and I agree that nothing anywhere is
> going to support this any time soon, and in the mean time, lots of stuff will
> break.  I would advocate a change to the standards, moving to UTF-8 in
> base64-encoded fields.

Yeah, that is one option. Unfortunately, it is not compatible with IE, afaik.
Is there a way to push this? I made a bad mistake that yesterday announced the username will be each office name and later leant this bug. When I decide username to be the office name I never ever imaged Fx have i18n problem on this part, because Fx holds i18n crown IMHO. Now I am frustrated to announce something different again, since users are coming from all different places and any change in this organization is horrible. Yes I should have checked Fx before any user management policy, but who can expect such bug?

Send username in the way IE does, or send it in the way Opera does, or send it in the way RFC does, whatever you do I can modify my web application to cope with it. But only, only if you drop some information from username, that makes it impossible for web developer side to do anything to save the situation. ANY CHANGE is better THAN CURRENT SITUATION. However I knew it's impossible in the atomsphere, things are like that because they used to be like that, and that's enough to make sure it hardly can be changed.

Please understand that it really breaks things completely for non-ASCII users. Sometimes I think in situation like Opera or IE, in non-democratic organization a manager can make arbitrary decision, and it happen to help solving problems faster. (Though one can argue it also helps creating problem faster)
Had a further rummage through the RFCs and to be honest none of it makes much sense. RFC2616's references to RFC2047 are in the context of TEXT, which in all the cases that matter to us are completely against the spirit of RFC2047 encoded-words-as-atoms - indeed in the case of the quoted-string realm, as Taka posted, it's also against the letter of RFC2047.

I think this is an error in 2616 (it wouldn't be the only one). HTTP is not explicitly an RFC822-family protocol: it defines its own header structures that are incompatible with 822 in significant ways instead of deferring to the older standard. If it really wanted to say we should be putting 2047 encoded-words inside 2616 quoted-strings, and base64-encoding them before putting them in auth headers, it should have specifically said so. Just referencing 'the rules of RFC2047' is not enough when those rules are incompatible with the rules of 2616.

In summary, the HTTP standard is self-contradictory and nobody else has managed to interpret and implement it in any way. So we should simply ignore it and not attempt to put 2047 encoded-words in HTTP headers at all(*).

The remaining options are: (a) continue to simply drop non-ISO-8859-1 characters on the floor (status quo), (b) just use UTF-8 (Opera method), (c) use the system codepage (IE method), and (d) guess.

If (a), we should at least stop non-ISO-8859-1 characters from being entered in the UI, to make it clear it's not permissable.

If (b), we potentially break any sites that are currently working with ISO-8859-1 (eg. if I have a username with an acute accent on it), when running on non-Western European machines. Are Opera users significantly affected by this? I haven't seen any reports of it.

If (c), we potentially break any sites not running in the same region as the client, and introduce an obscure dependency the user won't know how to sort out. How important is IE compatibility these days?

(d) might work, perhaps based on the Content-Type;charset of the page served to us with the WWW-Authenticate header, if any, falling back to pref intl.charset.default.
heh, forgot to add the off-topic footnote -

* - in the main entity headers, that is.

When posting eg. a multipart/form-data, the headers in each part are out of the scope of 2616, which explicitly (in section 3.7.2) hands their definition over to RFC2046, making them real 822-family headers, where RFC2047 and RFC2231 apply in the usual way.

That would mean technically we might be submitting non-ASCII form field as something like:

  Content-Disposition: form-data; name*=iso-8859-1'en'n=E9e

However, RFC2388, defining multipart/form-data again makes the same mistake as RFC2616 and HTML 4.01 in recommending RFC2047 encoded-words for use inside parameter values, as specifically disallowed in 2047. Sigh, them standards eh...

Of course in practice Mozilla should do neither, as it would solve little and break a lot, and continue with the current strategy of encoding field names to the form submission charset. Whether it is worth correctly following the quoted-pair scheme for backslash-escaping " and \ inside a quoted-string is another matter. (Opera does; IE does not.)
Eight years of this bug. Maybe it shold be fixed? All you need is to encode data to utf-8, as I undertand.
HTTP does not allow non-latin1 headers, see RFC 2616 2.2 (definition of TEXT), they have to be encoded.
Is utf-8 not compatible with this standard?
"HTTP does not allow non-latin1 headers, see RFC 2616 2.2 (definition of TEXT),
they have to be encoded."

That doesn't mean we (the users) should only use latin1, which is what's happening now. I use my own language everywhere, why should I do latin1 because of Fx?

Whatever solution it is, just solve it. (UTF-8 is most attractive for being simple and effective).
All I was saying is that it's not as simple as comment 42 made it sound.
"Is utf-8 not compatible with this standard?"

Unfortunately no. First it is RFC, that's regulation or (strong) recommendation, not strictly a standard. However I am not sure if it's practical to wait for IETF to correct it upstream until we do something. On the other hand, I am not sure if this is possible, that Fx corrects it NOW (say, using UTF8), then IETF has the pressure that NO BROWSER FOLLOWS their RFC and the method used by more browsers is UTF8, then it has to recognize this fact and update RFC, say, by 2028.

I am completely unsure how IETF works and if this is practical, but letting slowness from update of RFC affect end user experience is the down side of opensource. If it is commercial software, they would have pressure from user side to not to wait for a mature standard (e.g. Opera and IE).
Support Zhang Weiwu. Why non-english users must be limited to ASCII characters?
This bug added to our (Mozilla Russia)  bug bounty program. Cost is $300. For more details visit program page http://www.mozilla-russia.org/contribute/bounty-en.html
Attached patch Patch for UTF-8 encode (obsolete) — Splinter Review
This patch change behaviour of http authentication to send username and password in UTF-8. After this patch authentication in Mozilla happen exactly like Opera.
Attachment #315810 - Flags: review?(cbiesinger)
This php script implement simple http authentication for testing this bug.
A user's voice: SUPPORT THIS MOVE!
It appears that FF indeed uses UTF-8, if and only if the request is sent via XMLHTTPRequest (otherwise, as discussed above, ISO-8859-1). This really looks bizarre. Does this deserve it's own bug?
(In reply to comment #50)
> Created an attachment (id=315810) [details]
> Patch for UTF-8 encode
> 
> This patch change behaviour of http authentication to send username and
> password in UTF-8. After this patch authentication in Mozilla happen exactly
> like Opera.

Since we should to respect standards I can suggest to do it only for non-Latin characters. And leave ASCII for latin usernames/passwords. Doing things in this way we will not change current behavior, but fix the bug for non-latin characters.
(In reply to comment #55)
> Since we should to respect standards I can suggest to do it only for non-Latin
> characters. And leave ASCII for latin usernames/passwords. Doing things in this
> way we will not change current behavior, but fix the bug for non-latin
> characters.

That doesn't compute. USASCII is a subset of UTF-8.

That being said: the spec says it's ISO-8859-1, and changing this would break existing servers.
(In reply to comment #56)
> That doesn't compute. USASCII is a subset of UTF-8.
> 
> That being said: the spec says it's ISO-8859-1, and changing this would break
> existing servers.
Oops, I meant leaving ISO-8859-1 for latin characters. And switching to UTF-8 for non-Latin. I don't think it can break anything, but it will help to work with server configurations that understand utf-8.
> Oops, I meant leaving ISO-8859-1 for latin characters. And switching to UTF-8
for non-Latin.

I don't think having mixed-encoding strings is a useful option, it makes no sense and is compatible with no other browser.

Having discarded the option of RFC2047 encoding on the grounds that it is a logical contradiction and not compatible with anything, the basic options are:

1. ISO-8859-1. Compatible with the RFC and with IE in the US and Western Europe.

2. The system codepage. Compatible with IE in all locales.

3. UTF-8. Compatible with Opera, never compatible with IE (the system codepage cannot be UTF-8 under Windows.)

In my opinion none of these are really satisfactory.

To 2 we can add 2a: pref intl.charset.default. This is eminently preferable to the system codepage in my view as it is easier for the user to set without negative cross-platform consequences.

Better would be if we could guess an encoding based on any charset given in the Content-Type of the response containing the original WWW-Authorize header that drove the authentication request. (With fallback to options 1 or 2a where no such parameter is set.) However this is not a simple hack I can immediately see how to do; we would have to remember a charset everywhere auth details are stored.
Another options is to define "basic2" and "digest2", let them use UTF-8, and try to get this implemented in Apache and IIS as well.
(In reply to comment #58)
> Better would be if we could guess an encoding based on any charset given in the
> Content-Type of the response containing the original WWW-Authorize header that
> drove the authentication request.

That would be the charset of the server's 401 page - but this is usually a standard page meaning that its charset isn't related to the charset of the page you are sending the password to.
> 2. The system codepage. Compatible with IE in all locales.

Though usable only by coincidence, as the server side wouldn't know what to decode to. My 2c would certainly be to go with the UTF-8 approach.
> this is usually a standard page meaning that its charset isn't related to the charset of the page

It's not necessarily a standard page. If you're writing a web application you'll be returning an error page, probably in the same charset as every other page, which will be in the same charset you want for the authentication (UTF-8 if you're sensible).

If you get the standard page, the web server probably won't be serving it with a charset parameter so it falls through to the default.

> Though usable only by coincidence, as the server side wouldn't know what to
decode to. My 2c would certainly be to go with the UTF-8 approach.

Yes, but compatible by coincidence is still better than compatible never!

Putting UTF-8 in as-is does break existing IE compatibility for the US and Western Europe locales, giving site author no way to be compatible with both browsers.

> Another options is to define "basic2" and "digest2"

That would be nice, but a lot of work pushing it through WHAT-WG and browsers before site authors can use it! Is there any way it could be done retaining backwards compatibility for the cases where no non-ASCII characters are used? eg.

WWW-Authorize: basic realm="My realm;charset=utf-8"

(Looks ugly if the browser doesn't strip off the trailer, but at least it would still work.)
(In reply to comment #62)
> > Another options is to define "basic2" and "digest2"
> 
> That would be nice, but a lot of work pushing it through WHAT-WG and browsers
> before site authors can use it! Is there any way it could be done retaining
> backwards compatibility for the cases where no non-ASCII characters are used?
> eg.
> 
> WWW-Authorize: basic realm="My realm;charset=utf-8"
> 
> (Looks ugly if the browser doesn't strip off the trailer, but at least it would
> still work.)

The WHATWG isn't relevant for that. The HTTPbis WG (IETF) could be, although it's not chartered to define new authentication schemes.

That being said, new schemes can easily be introduced (from a technical p.o.v.) - the server can offer several different schemes, and the client can pick what it knows about.
This bug has been stalled for almost a decade now, and it's certainly not the only HTTP Auth bug that's been around this long.  It would be nice it we could make some progress on this.

My two cents:
Providing some mechanism to allow non-ASCII characters is better than providing no mechanism.
UTF-8 encoding makes the most sense of the options presented.  If the IETF standard is unimplementable as it seems to be from the above description, we should do what seems the most sensible IMO.

There's already a patch here for this (haven't checked for bitrot though), so if we can come to a consensus on the approach to take this bug can be resolved shortly.
"This bug has been stalled for almost a decade now, and it's certainly not the
only HTTP Auth bug that's been around this long.  It would be nice it we could
make some progress on this."

Yes.

"Providing some mechanism to allow non-ASCII characters is better than providing
no mechanism."

There is a mechanism, it's ISO-8859-1. There are servers out there that rely on it.

"UTF-8 encoding makes the most sense of the options presented.  If the IETF
standard is unimplementable as it seems to be from the above description, we
should do what seems the most sensible IMO."

The IETF standard is implementable, it "just" doesn't handle non ISO-8859-1 characters.

"There's already a patch here for this (haven't checked for bitrot though), so
if we can come to a consensus on the approach to take this bug can be resolved
shortly."

Big -1. You will break stuff.

The right fix is to define a new scheme, let's say "Basic2", which differs from "Basic" in the encoding being UTF-8 instead of ISO-8859-1, and get it deployed. For instance, by adding support both in Firefox and Apache httpd.
Kyle> It would be nice it we could make some progress on this.

Well, yes, but no-one can agree which direction counts as ‘progress’! There is no one direction that improves every use case from what we have at the moment. Whilst I agree that UTF-8 is a more sensible encoding in general (and using it was my original suggestion), we'd lose the accidental-compatibility with IE we currently have for Western European users (other than the differences between cp1252 and 8859-1).

Julian> There is a mechanism, it's ISO-8859-1.

I'd still dispute that RFC2616 says that. Whilst HTTP/1.1 headers are explicitly ISO-8559-1, authentication details are not an HTTP header, even if they are encapsulated in one. The base64 token decodes only to a bunch of bytes; there is no standards-endorsed way of turning that into characters.

Indeed, the current mechanism Moz uses isn't even ISO-8859-1 as such, it's just taking the least-significant byte of a UTF-16 codepoint. I would much rather have the user's configured default encoding used for this; this could retain the IE-partial-compatibility and allow other users to at least get sites to work until such time as a proper charset-signalling method is available:

> The right fix is to define a new scheme, let's say "Basic2"

Well, right or not, I can't see it getting support within another decade! :-(

Most importantly, could it be deployed in a backwards-compatible way? Do UAs even understand multiple ‘WWW-Authenticate’ challenges ATM? If not, do they at least read the first (Basic) challenge from a list of challenges?

If not, could we get away with adding an optional charset auth-param to Basic challenges and/or responses? (RFC2617 implies they should be extensible, but I don't know what real-world UAs and servers do here.) If not, all we're left with is the ugly options of an ‘X-Authorization-Charset’ request header.

Incidentally, the good news is that HTTPbis has recently removed the broken references to RFC2047 that got us all so confused above.
Andrew> I'd still dispute that RFC2616 says that. Whilst HTTP/1.1 headers are
Andrew> explicitly ISO-8559-1, authentication details are not an HTTP header,
Andrew> even if they are encapsulated in one. The base64 token decodes only to a
Andrew> bunch of bytes; there is no standards-endorsed way of turning that into
Andrew> characters.

I agree it's not obvious, and that the spec sucks with respect to that. But, it *does* say:

   base64-user-pass  = <base64 [4] encoding of user-pass,
                    except not limited to 76 char/line>
   user-pass   = userid ":" password
   userid      = *<TEXT excluding ":">
   password    = *TEXT

So it re-uses the TEXT production, and that is defined in RFC2616 with the words (Section 2.2):

"The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 [22] only when encoded according to the rules of RFC 2047 [14].

    TEXT           = <any OCTET except CTLs,
                     but including LWS>"

So at least some people read this as defining it.

Andrew> Well, right or not, I can't see it getting support within another
Andrew> decade! :-(

It will never happen if nobody tries.

Andrew> Most importantly, could it be deployed in a backwards-compatible way?
Andrew> Do UAs even understand multiple ‘WWW-Authenticate’ challenges ATM?
Andrew> If not, do they at least read the first (Basic) challenge from a
Andrew> list of challenges?

We discussed that not so long ago in the context of http://tools.ietf.org/html/draft-broyer-http-cookie-auth-00, and my recollection is that only Opera made problems, and their developers signaled it was a known issue that needs to be fixed.

Andrew> If not, could we get away with adding an optional charset auth-param
Andrew> to Basic challenges and/or responses? (RFC2617 implies they should be
Andrew> extensible, but I don't know what real-world UAs and servers do here.)

The extension mechanism is "must ignore", in which case "old" UA will do whatever they did for Basic, thus fail to authenticate. If they did *not* ignore, it would be impossible to deploy.

Andrew> Incidentally, the good news is that HTTPbis has recently removed
Andrew> the broken references to RFC2047 that got us all so confused above.

They are gone from Part1, but Part7 (Auth) still delegates most to RFC2617 which still has it (HTTPbis is not chartered to revise RFC2617).
I just did a quick test with

  WWW-Authenticate: Basic "foo" enc=utf-8

and it seems that all (Windows)-UAs ignore the parameter except for Opera and Chrome, for which I'll open bug reports.
Of course that should have been

  WWW-Authenticate: Basic realm="foo" enc=utf-8

which makes it work in Opera, but still failts in Chrome.
Just noticed this bug when reading the source trying to figure out why Digest auth completely fails with non-ASCII characters in the login even if within ISO-8859-1...
I have been watching this bug for years. I decide to second Andrew Clover's earlier idea again trying to bring this 9-year-old bug a little step further.

I recommend to add a config option "charset for http authentication" and defaults to utf8, or if not acceptable to you all, default to ascii which is the current situation. This way at least admins can change this option to reasonable value for every of his user for his own deployment. e.g. I would change it to utf8 as part of desktop & notebook computer maintain policy to make our own intranet work, I can imagine the other guy in a company where intranet is M$ based can change it to GB18030 (or any charset used by default in majority of their company) to smooth the switch to firefox without breaking existing web app deployed on MS stack.

I am afraid if my suggestion is taken, the demond for a clean and final solution (defaults to UTF8 or a better RFC) will be weaken that default unreasonable ascii setting would sleep a few decades, maybe until firefox fads a way from the market if there is such a day.

I also wonder how konqueror and Google chrome handles the case.
It would be very good to have some data on how Safari and/or Chrome are handling this.  Does anyone know?  Also, do we know how well Opera's decision to use UTF-8 has turned out?
Chrome uses UTF-8.

Safari uses ISO-8859-1 (proper 8859, not cp1252) and silently fails to send any authorization header at all when there is a non-ISO-8859-1 character present in the username or password. (Safari also has some other fairly bad auth problems.)

It looks like every site is going to have dropped HTTP Basic Authentication in favour of cookies before this sees any resolution.
blocking2.0: --- → ?
(In reply to comment #74)
> Chrome uses UTF-8.

Hmm, if we go with that, I think the two implementations can force the issue. Let's do it.
This *will* break with existing servers, unless Firefox retries with ISO-8859-1 upon 401.

Again; the proper solution it either to extend Basic with a parameter, or define a new authentication scheme.
(In reply to comment #76)
> This *will* break with existing servers, unless Firefox retries with ISO-8859-1
> upon 401.

Firefox has never sent anything other than ASCII, so no servers that work with Firefox now will break. No browser actually follows the spec here, afaik, so utf8 seems like just the thing. We should certainly try the patch I'm about to attach and see how things go.
Attachment #420886 - Flags: superreview?(bzbarsky)
Attachment #420886 - Flags: review?(bzbarsky)
Attachment #315810 - Attachment is obsolete: true
Attachment #315810 - Flags: review?(cbiesinger)
Comment on attachment 420886 [details] [diff] [review]
Convert basic credentials to UTF-8 prior to base64 encoding them

This is Andrey M.'s patch with a test added.
Attachment #420886 - Attachment is patch: true
Attachment #420886 - Attachment mime type: application/octet-stream → text/plain
(In reply to comment #77)

> Firefox has never sent anything other than ASCII, so no servers that work with
> Firefox now will break.

Are you sure? From the patch it looks like all we did was send the lower byte of a 2-byte UCS2 character, now any character between U+0080 and U+00FF will be encoded as 2 bytes (for UTF-8). So, an existing username/password like "MëtälMøøse" would stop working... Servers that assumed we were sending ISO-8859-1, Windows-1252, or other charsets that happen to overlap with this range would have largely worked fine.
(In reply to comment #80)
> 
> Are you sure? From the patch it looks like all we did was send the lower byte
> of a 2-byte UCS2 character, now any character between U+0080 and U+00FF will be
> encoded as 2 bytes (for UTF-8). So, an existing username/password like
> "MëtälMøøse" would stop working... 

oh shoot, I think you're right. More testing needed, I guess.
At least is it possible to show encoded content of the popups like this one is unreadable http://lib.homelinux.org
Ooops. glad to know something other than opera work with utf8. Not that I hate opera, but our deployed web applications have some bug with opera that no one can fix because they rooted too deep. I am migrating my users (corporate deployement) to Chrome, change from http authentication to a web form based authentication would create more problems than forcing users to change browser. Luckily we have an IT department that can decide what users use.
Attachment #420886 - Flags: superreview?(bzbarsky)
Attachment #420886 - Flags: review?(bzbarsky)
Hmm.  Retrying with byte-truncation upon 401 might actually be reasonable...  and get us out of this impasse.  How hard would that be to do?
We could introduce a simple continuation state object for basic auth. It would keep a state of attempt to send creds in UTF-8 or loosy-convert form. In case we failed for UTF-8, nsHttpBasicAuth::ChallengeReceived would then set *identityInvalid to PR_FALSE and move the state to loosy-convert, and on the following attempt it would set *identityInvalid to PR_TRUE as it does now, breaking the basic auth cycle. nsHttpBasicAuth::GenerateCredentials would then just convert according to the state object. We have to invalidate the state object when the challenge (actually the realm) changes in ChallengeReceived.

Relatively simple. Makes sense?

Could this new code be potentially exploitable with a UTF-8 and loosy-convert user name or password overlap (two user-names are converted to the same byte representation one by UTF8 and other by loosy-convert, for instance) ?
(In reply to comment #85)
> 
> Could this new code be potentially exploitable with a UTF-8 and loosy-convert
> user name or password overlap 

don't think so, but maybe I'm missing the point. how would the attack work. the credentials are pretty much in the clear, so what does it matter?
Not blocking 1.9.3 on this.
blocking2.0: ? → -
Blocks: 546330
So one thing confusing me here... The attached patch doesn't change AppendQuotedString, right?  Why not?  See bug 546330.
I have started work on an Internet Draft defining an auth param expressing the server's expected character encoding, see http://greenbytes.de/tech/webdav/draft-reschke-basicauth-enc-latest.html. Feedback, such as test cases, appreciated. Co-authors as well.
@Julian: Looks fairly simple and watertight, good. The comma-separated syntax is consistent with Digest auth and of the browsers I have to hand, none have tripped on it -

Already use UTF-8 anyway:
    Opera 10
    Chrome 5

Uses a different encoding but has no problem with req param (so ASCII is still OK):
    Firefox 1-4  (UTF-16 lower bytes)
    Safari 4-5   (ISO-8859-1)
    IE:Mac 5     (MacRoman)
    IE 5-8       (CP_ACP)
    IEMobile 6-8 (CP_ACP)
    Netscape 4   (CP_ACP)
    Opera 5      (CP_ACP)

Assuming 'Deployment Advice' is to be aimed at webapp/server authors (rather than browser vendors), testing incoming credentials against both UTF-8 and ISO-8859-1 is a good start, though it should be stressed that this doesn't fully cover real-world browser behaviour.

The system codepage of Western Windows machines (cp1252) is close enough to ISO-8859-1 for the purposes of the simple Western European accented letters the encodings share, but other marks like smart quotes or the euro won't come through. And of course in other territories sites are left guessing what codepage their users have, like the Japanese sites where you can only log in with IE using cp932.
Assignee: sayrer → hurley
Attachment #420886 - Attachment is obsolete: true
Attachment #521605 - Flags: superreview?(bzbarsky)
Attachment #521605 - Flags: review?(bsmith)
It is more important to be compatible with the servers than it is to be compatible with Opera and Chrome, since the servers are the ones we have to work with. How do the most recent versions of IIS, Apache, Tomcat, and PHP (at least) decode this? If that's UTF-8, then great. If not, then we should provide the server a way to say "use UTF-8" instead. The enc parameter proposed by Julian seems like a reasonable approach, but also it could just be flagged in a separate header. Also, when we're encoding in UTF-8, we should send some indication (e.g. in another header field), besides the User-Agent, that we are using UTF-8, unless we have evidence that servers are already expecting UTF-8.

The idea of trying one encoding and then trying another will work poorly with sites that lock your account after a small number of login failures. Plus, it is added complexity that we don't need.
(In reply to comment #93)
> It is more important to be compatible with the servers than it is to be
> compatible with Opera and Chrome, since the servers are the ones we have to
> work with. How do the most recent versions of IIS, Apache, Tomcat, and PHP (at
> least) decode this? If that's UTF-8, then great. If not, then we should provide
> the server a way to say "use UTF-8" instead. The enc parameter proposed by
> Julian seems like a reasonable approach, but also it could just be flagged in a
> separate header. Also, when we're encoding in UTF-8, we should send some

I believe we can get away with the new parameter; after all, that's what they are for.

> indication (e.g. in another header field), besides the User-Agent, that we are
> using UTF-8, unless we have evidence that servers are already expecting UTF-8.

See <http://greenbytes.de/tech/webdav/draft-reschke-basicauth-enc-00.html#rfc.section.B.3>.

I'm not convinced it's needed; could you explain a use case where the server could take advantage from that information?

(I'm not strictly against it; I'd just like to avoid information nobody needs).

> The idea of trying one encoding and then trying another will work poorly with
> sites that lock your account after a small number of login failures. Plus, it
> is added complexity that we don't need.

I agree that this might be a problem; on the other hand it might be the way of least resistance, although a hack.

-> http://greenbytes.de/tech/webdav/draft-reschke-basicauth-enc-01.html#rfc.section.A.1.1 (will mention this in the next revision)
(In reply to comment #94)
> I believe we can get away with the new parameter; after all, that's what they
> are for.

RFC2617 doesn't provide for new parameters for BASIC.

> > indication (e.g. in another header field), besides the User-Agent
> 
> I'm not convinced it's needed; could you explain a use case where the server
> could take advantage from that information?

Some servers will be able to just try to decode using multiple encodings until one works. But, if the account lockout controls are behind the web server (which is the case for some Mozilla services, for example), then this could lead to similar account-lockout problem I mentioned above for the case where clients retry with multiple encodings. If the server knows which encoding the client used then it can choose that encoding on the first try and avoid the issue. User-Agent isn't the appropriate way of conveying this information about the encoding and/or whether the UA ignored the server's indication of the encoding it wants to use.

The nice thing about Julian's parameter-based solution or a new server-sent header field is that it is easy for server admins to configure their servers to work with us without code changes, using mod_headers or similar, as long as we are willing to support encodings other than UTF-8 like cp932 (for Japanese) sites. Any other solution requires server code changes and means that we are still going to be incompatible with some existing sites.

I do not understand why Julian's parameter-based solution is that it seems like it will break for some parsers (like Chrome's), and those parsers are compliant because RFC2617 doesn't require parsers to accept additional parameters. New is header fields (one for Proxy-Auth and one for Auth) make more sense, considering maximizing interoperability between new and old implementations is the goal.
(In reply to comment #95)
> (In reply to comment #94)
> > I believe we can get away with the new parameter; after all, that's what they
> > are for.
> 
> RFC2617 doesn't provide for new parameters for BASIC.

RFC 2617 says that the challenge can contain auth params, it just doesn't define any for Basic.

Anyway, the spec would update 2617 anyway. 

> > > indication (e.g. in another header field), besides the User-Agent
> > 
> > I'm not convinced it's needed; could you explain a use case where the server
> > could take advantage from that information?
> 
> Some servers will be able to just try to decode using multiple encodings until
> one works. But, if the account lockout controls are behind the web server

That sounds like a potential security hole. Do you have evidence of servers behaving like that? And if they do, wouldn't it make sense to always start with UTF-8?

> (which is the case for some Mozilla services, for example), then this could
> lead to similar account-lockout problem I mentioned above for the case where
> clients retry with multiple encodings. If the server knows which encoding the
> client used then it can choose that encoding on the first try and avoid the
> issue. User-Agent isn't the appropriate way of conveying this information about
> the encoding and/or whether the UA ignored the server's indication of the
> encoding it wants to use.
> 
> The nice thing about Julian's parameter-based solution or a new server-sent
> header field is that it is easy for server admins to configure their servers to
> work with us without code changes, using mod_headers or similar, as long as we
> are willing to support encodings other than UTF-8 like cp932 (for Japanese)
> sites. Any other solution requires server code changes and means that we are
> still going to be incompatible with some existing sites.
> 
> I do not understand why Julian's parameter-based solution is that it seems like
> it will break for some parsers (like Chrome's), and those parsers are compliant

If it breaks Chrome, let's raise a bug report.

> because RFC2617 doesn't require parsers to accept additional parameters. New is

It does:

  challenge   = auth-scheme 1*SP 1#auth-param

So recipients must accept params; it just doesn't require them to do anything useful with them for Basic.

If you believe there's a spec problem, then let's fix it in draft-ietf-httpbis-p7-auth (which replaces the authentication framework parts of 2617).

> header fields (one for Proxy-Auth and one for Auth) make more sense,
> considering maximizing interoperability between new and old implementations is
> the goal.

Implementations currently update fast. The only thing that would scare me would be a problem in IE8.
The servers themselves don't do anything involving character set encoding/decoding of basic auth. To them, it's just a sequence of bytes that gets hashed using whatever hash they may use (if any) and compared to the stored password (hashed or not). So that brings us to the question of how are the stored hashes computed? In the case of apache with a htpasswd file, it's based entirely on whatever input encoding the user happens to be using when they run "htpasswd". For apache with some other backing store for the passwords, as well as PHP using some random backing store, it will be whatever encoding is used by whatever piece of software reads the password from the user and writes it (in hashed or otherwise form) to the backing store. Tomcat's hash generator program takes a "-e" argument to tell it what encoding to use. By default, IIS is tied to only using windows accounts, so I imagine that's all done using the system codepage (I couldn't find any documentation on using IIS with non-windows passwords in my quick googling).

So where does this leave us? For the most part, we're at the mercy of whoever wrote whatever software is requiring authentication, or worse yet, the user that set his/her own password. Even if we could get all UA vendors and all server vendors to implement a scheme where the UA tells the server what encoding it used, that doesn't do us much good unless everyone rewrites their custom-rolled basic auth systems to honor the server when it tells them "Hey, the UA gave me this password encoded in UTF-8".

Just using UTF-8 has the advantages of (1) not breaking ascii passwords that already work with fx just fine, and (2) giving us SOME sort of compatibility with non-latin character sets that makes some sort of sense, instead of just stripping out all but the lowest byte of whatever characters were input. We still won't be compatible with non-latin IIS installations that use the system codepage for usernames/passwords, but we weren't before. If UTF-8 is made the default, we also lose default compatibility with latin charsets in the case when a username/password contains a non-ascii character, but that is mitigated with a pref to enable or disable utf-8 encoding of basic auth info (see my current patch).
Comment on attachment 521605 [details] [diff] [review]
Allow converting basic creds to UTF-8 before base64 encoding

One nit.  This pattern:

>+    if (prefs)
>+        if (NS_FAILED(prefs->GetBoolPref(kUseUTF8, &utf8auth)))

is probably better as:

  if (prefs && NS_FAILED(prefs->GetBoolPref(kUseUTF8, &utf8auth)))

(less indentation and such).

sr=me with that fixed for getting this in and then at least asking some users in non-Western locales (and maybe some Western ones) to flip the pref and so we can gather some data on what servers are actually expecting....

We'll need to broadcast the need for such testing somewhat; maybe we can ask the Mozilla Europe and Mozilla Japan folks to help.  Please let me know if you need help with that part once you land the patch, ok?
Attachment #521605 - Flags: superreview?(bzbarsky) → superreview+
(In reply to comment #96)
> RFC 2617 says that the challenge can contain auth params, it just doesn't
> define any for Basic.

>   challenge   = auth-scheme 1*SP 1#auth-param
> 
> So recipients must accept params; it just doesn't require them to do
> anything useful with them for Basic.

But, for BASIC it is restricted to:

    challenge   = "Basic" realm

Anyway, bug 656213 is now the bug about Gecko discovering what encoding the server wants us to use. By default, we will use UTF-8 unless/until there's a different consensus reached between us and the other browsers.
(In reply to comment #99)
> (In reply to comment #96)
> > RFC 2617 says that the challenge can contain auth params, it just doesn't
> > define any for Basic.
> 
> >   challenge   = auth-scheme 1*SP 1#auth-param
> > 
> > So recipients must accept params; it just doesn't require them to do
> > anything useful with them for Basic.
> 
> But, for BASIC it is restricted to:
> 
>     challenge   = "Basic" realm

No, it's not "restricted". This is an extension point.

> Anyway, bug 656213 is now the bug about Gecko discovering what encoding the
> server wants us to use. By default, we will use UTF-8 unless/until there's a
> different consensus reached between us and the other browsers.

Using UTF-8 will break existing deployments that rely on ISO-8859-1.
(In reply to comment #100)
> Using UTF-8 will break existing deployments that rely on ISO-8859-1.

These ISO-8859-1 passwords already don't work in Firefox (unless they are already valid UTF-8), so it would not be a regression in our behavior. And, like I said, if/when there's consensus on an alternative solution, we will almost definitely implement that solution. (But, I'm not holding my breath on there being a consensus any time soon.)
(In reply to comment #101)
> (In reply to comment #100)
> > Using UTF-8 will break existing deployments that rely on ISO-8859-1.
> 
> These ISO-8859-1 passwords already don't work in Firefox (unless they are

As far as I can tell, they do work, because Firefox just drop the top 8 bits, and thus ISO-8859-1 passes through unharmed. The same IMHO is true for IE. See summary in comment 58.

> ...
(In reply to comment #102)
> As far as I can tell, they do work, because Firefox just drop the top 8
> bits, and thus ISO-8859-1 passes through unharmed. The same IMHO is true for
> IE. See summary in comment 58.

Thanks Julian. The key point that was overlooked previously is that codepoints 0-255 are the same in ISO-8859-1 and Unicode, so chopping off the high byte works pretty well for sites that are expecting ISO-8859-1, which is probably most of the Americas and Western Europe. Consequently, switching to UTF-8 by default *would* be not be a regression-free change, as I think some were assuming.

Our goal here is to get non-ISO-8859-1 characters in usernames and passwords to work as well as the ISO-8859-1 characters already do. We don't have any evidence that switching to UTF-8 actually improves things here. We should avoid switching the default to UTF-8 unless we have some evidence that it actually helps. I filed two bugs blocking this one to improve things here.
Summary: http authentication does not support non-ASCII characters → HTTP authentication does not support non-ISO-8859-1 characters
Depends on: 656213
No longer depends on: 295084
Comment on attachment 521605 [details] [diff] [review]
Allow converting basic creds to UTF-8 before base64 encoding

I am r-'ing the patch because it would cause regressions in current behavior.
Attachment #521605 - Flags: review?(bsmith) → review-
Hold on.  Did you read the patch?  The patch adds a _pref_ to allow the user to switch to UTF-8 for the encoding for auth.  The default value of the pref is set such that the current ISO-8859-1 behavior is used, precisely because a wholesale switch to UTF-8 would not be backwards-compatible, but the patch gives users the _option_ of doing something else.  See comment 98 for why I think such a pref could be useful.
(In reply to comment #105)
> Hold on.  Did you read the patch?  The patch adds a _pref_ to allow the user
> to switch to UTF-8 for the encoding for auth.  The default value of the pref
> is set such that the current ISO-8859-1 behavior is used, precisely because
> a wholesale switch to UTF-8 would not be backwards-compatible, but the patch
> gives users the _option_ of doing something else.  See comment 98 for why I
> think such a pref could be useful.

All the information we've been given indicates that UTF-8 is the one encoding least likely to work. I expect the IE way (which is never UTF-8) to dominate, especially in Asian markets that are IE-centric like China, Korean, and Japan. The patch would be useful if it let the user select a non-UTF-8 encoding to use, because then the user could experiment with different settings and report back what encoding each of the sites they are testing is actually using, instead of being limited to "ISO-8859-1", "UTF-8", and "not ISO-8859-1 and not UTF-8".
I tried the patch with IIS and an account name with diacritics.  Either way (w/o or w/ the patch and the pref turned on) I could not log in.

I can try the same on my virtual linux box, if someone else doesn't already have a test environment to try.
(In reply to Nick Hurley [:hurley] from comment #97)
> The servers themselves don't do anything involving character set
> encoding/decoding of basic auth. To them, it's just a sequence of bytes that
> gets hashed using whatever hash they may use (if any) and compared to the
> stored password (hashed or not). So that brings us to the question of how
> are the stored hashes computed? In the case of apache with a htpasswd file,
> it's based entirely on whatever input encoding the user happens to be using
> when they run "htpasswd". For apache with some other backing store for the
> passwords, as well as PHP using some random backing store, it will be
> whatever encoding is used by whatever piece of software reads the password
> from the user and writes it (in hashed or otherwise form) to the backing
> store. Tomcat's hash generator program takes a "-e" argument to tell it what
> encoding to use. By default, IIS is tied to only using windows accounts, so
> I imagine that's all done using the system codepage (I couldn't find any
> documentation on using IIS with non-windows passwords in my quick googling).
> 
> So where does this leave us? For the most part, we're at the mercy of
> whoever wrote whatever software is requiring authentication, or worse yet,
> the user that set his/her own password. Even if we could get all UA vendors
> and all server vendors to implement a scheme where the UA tells the server
> what encoding it used, that doesn't do us much good unless everyone rewrites
> their custom-rolled basic auth systems to honor the server when it tells
> them "Hey, the UA gave me this password encoded in UTF-8".
> ...

Point taken; but in these cases the problem essentially is not solvable, right? After all, the *user* enters characters, not octets, in their password prompt.

I don't believe we should reject a fix just because it doesn't fix a problem that by definition can not be solved.
(In reply to Julian Reschke from comment #111) 
> Point taken; but in these cases the problem essentially is not solvable,
> right? After all, the *user* enters characters, not octets, in their
> password prompt.
> 
> I don't believe we should reject a fix just because it doesn't fix a problem
> that by definition can not be solved.

For the record, I'm fairly certain that was the point I was trying to make, as well, but it's been so long since I made that comment, that I'm a little fuzzy on the details :)
Assignee: hurley → nobody
My 2 cents: the current version of Firefox (20 under Windows) will still not work with non-ascii userid (at least with the digest authentication).
However, a non ascii password with an ascii username is handled "correctly".

Actually a non ascii password with an non ascii username are handled correctly, until the username is added in the response header.
This is because the nsHttpDigestAuth::AppendQuotedString() method in nsHttpDigestAuth.cpp checks if non ascii characters are used, and fails if non ascii characters are used. 

What about having a version of AppendQuotedString() (say AppendUTF8QuotedString) that is used to append the username in the response header.

Actually this is what I have done today: I have downloaded, built, modified nsHttpDigestAuth.cpp, rebuilt and it seems to work very well (at least, it works like some other browsers do...), and it does NOT break anything AFAIU...

BTW, I would like to submit the modified version, but I'm not sure about how to do it: Can I just check out nsHttpDigestAuth.cpp and then check it in..?

Any comment are welcome.






Why not
I've read that I must have this bug assigned to me to submit a modification.

Can someone with the required authority do that please?

Or how do I "take" the bug as suggested by the text beside the "Assigned To" field : "Nobody; OK to take it and work on it" ?

Thank you.
Assignee: nobody → ggo98
btw you don't really need to be assigned the bug to attach patches and request reviews. Note that there is already a patch here with a r- attached to it. Best to understand the issues there (talking to author, reviewer, other commenters) before reliving the experience :)
As the author of that (long ago) r-'d patch, I was just about to chime in :)

I will start by saying this: please don't let any of my notes below discourage you from contributing to this (or any other!) bugs. This is all just background information for why (unless people's minds have changed) I would be surprised if we accept a "switch to UTF-8" approach for this bug. I definitely would like to see this bug fixed, though!

Now that that's been said, here's the info dump:

One concern (if I remember correctly) is that "just" changing auth to be UTF-8 (or having a pref for that, as in my patch) could easily lead to silent breakage from servers that have (over the 12+ years this bug has existed) been configured to expect our broken behavior.

Even worse, sending UTF-8 auth may fix some sites, but break others, so we would need to have some way to detect a server that broke with UTF-8 and change the encoding on the fly to fall back to ASCII.

I had a patch that, at one point, did the fallback, but there was yet another problem: some (many?) sites enforce a lockout after N failed login attempts, with 3 being a (in my experience) common value for N. So imagine the case where a user is logging into a server that breaks with UTF-8 but works with ASCII. Imagine again that the user enters their password incorrectly on the first try. We then have this flow of events:

1. User sends incorrect password, encoded incorrectly (first login failure)
2. We automatically fall back to the expected encoding, but still with the wrong password, and send that (second login failure)
3. User gets prompted for their password again, and enters the correct password
4. We send the correct password, encoded incorrectly (third login failure)

At this point, assuming N=3 for lockout, the user is locked out of their account, even though (in their experience) they've only tried entering their password twice, instead of the 3 chances that they are supposed to get.

The final issue was one of incompleteness. Adding UTF-8 as an option may gain us a few more sites that work, but the risk for breakage with other sites (and the lack of fixing even more sites) seems to have outweighed the limited gains we could have made with the "simple fix".

Extensions to the HTTP spec to allow the server to say "encode your authentication in encoding X" have been suggested, but I don't believe any of them have gotten anywhere.

Unfortunately a large portion of this discussion happened between me and bsmith in person, so there wasn't a record of it in this bug (or if there was, it's scattered in many different comments, which makes it hard to get at).
Ok, thank you :)
(In reply to Nick Hurley [:hurley] from comment #116)
> As the author of that (long ago) r-'d patch, I was just about to chime in :)
> 
> I will start by saying this: please don't let any of my notes below
> discourage you from contributing to this (or any other!) bugs. This is all
> just background information for why (unless people's minds have changed) I
> would be surprised if we accept a "switch to UTF-8" approach for this bug. I
> definitely would like to see this bug fixed, though!
> 
> Now that that's been said, here's the info dump:
> 
> One concern (if I remember correctly) is that "just" changing auth to be
> UTF-8 (or having a pref for that, as in my patch) could easily lead to
> silent breakage from servers that have (over the 12+ years this bug has
> existed) been configured to expect our broken behavior.
> 
> Even worse, sending UTF-8 auth may fix some sites, but break others, so we
> would need to have some way to detect a server that broke with UTF-8 and
> change the encoding on the fly to fall back to ASCII.
> 
> I had a patch that, at one point, did the fallback, but there was yet
> another problem: some (many?) sites enforce a lockout after N failed login
> attempts, with 3 being a (in my experience) common value for N. So imagine
> the case where a user is logging into a server that breaks with UTF-8 but
> works with ASCII. Imagine again that the user enters their password
> incorrectly on the first try. We then have this flow of events:
> 
> 1. User sends incorrect password, encoded incorrectly (first login failure)
> 2. We automatically fall back to the expected encoding, but still with the
> wrong password, and send that (second login failure)
> 3. User gets prompted for their password again, and enters the correct
> password
> 4. We send the correct password, encoded incorrectly (third login failure)
> 
> At this point, assuming N=3 for lockout, the user is locked out of their
> account, even though (in their experience) they've only tried entering their
> password twice, instead of the 3 chances that they are supposed to get.
> 
> The final issue was one of incompleteness. Adding UTF-8 as an option may
> gain us a few more sites that work, but the risk for breakage with other
> sites (and the lack of fixing even more sites) seems to have outweighed the
> limited gains we could have made with the "simple fix".
> 
> Extensions to the HTTP spec to allow the server to say "encode your
> authentication in encoding X" have been suggested, but I don't believe any
> of them have gotten anywhere.
> 
> Unfortunately a large portion of this discussion happened between me and
> bsmith in person, so there wasn't a record of it in this bug (or if there
> was, it's scattered in many different comments, which makes it hard to get
> at).


Hi Nick,

first thank you very for your comments and valuable informations you provide.

I've seen your patch, and that there has been a lot of discussion about this issue, and some thought it was not a good idea to send back some UTF-8 encoded stuff (the username in this case).



I understand very well the need not to break existing solutions, but one thing is that I'm talking about the Digest authentication; 
I have not checked precisely how Basic authentication is implemented up to now, because it's really unsafe (except with SSL).
Please correct me if I'm wrong, but I believe that your patch is for Basic authentication.

What I understand from the existing Digest authentication code is that it does already handle the creation of the response_digest using UTF-8.
The code "fails" when adding the username in the response header, because non-control ASCII characters only are allowed, which is what the original specification says (once again I may be wrong, but it seems to be like that according to what I've been reading up to now).

Now, what will happen if the username is sent in the response header with the UTF-8 encoding (which is what some other browsers already do):
1) username containing ASCII characters only will map to the same string in the UTF-8 encoding, so nothing is broken for the existing implementations.
2) username containing UNICODE characters will map to strings containing non-ASCII characters (but not control characters). 

On the server side, the digest authentication module (called "DAM" later) receives the username with bytes having the highest bit set to 1 (= bytes with a value >= 128). 

I think that the "DAM" *should* see it, and if it does follow the specification stricly, reject the authentication attempt without any further processing.

But may be that the "DAM" won't do that, and some problems may occur here. 
Here is 1 and 2 possible consequences
(may be there are others I'm not aware of yet):

Problem: the "DAM" might just IGNORE the highest bit, and, if we are unlucky enough, the resulting username (called "username7bits" later) might match an existing userid. This leads to 2 possible consequences:

a) After the usual 3 login attempts, the "username7bits" is locked, as you mentioned.

b) Since MD5 has collisions (I'm not expert about this, I only know it like most of us), the digest answer *might* match what would have been the digest for the "username7bits" + its password. In this unlikely case, the original username might be authenticated as "username7bit"...

Now, what is the original problem on the server side in this possible implementation: the "DAM" truncates the original username bytes, 
so ...this is a problem with the server, not a problem with Firefox! 
If a server does this, then it *must* be fixed on the server.
And the fix is simple: if UTF-8 support for the username is not wanted on the server then the authentication attempt has to be rejected, without any other processing.

Last words: the idea that Firefox is going to be the *only* Web Browser 
which supports ASCII characters only for the username sounds bad to me.

So... I will try to submit the minor modifications in mozilla-central;
if they are rejected, there will probably be good reasons, regarding respecting the standards, or whatever.
Unfortunately users who wish to use unicode user names with the Digest authentication (and it does happen) will still not be able use Firefox...

May be I'm completely wrong about this, so I would be glad if you and/or other wish to add comments / remarks.

Have a good day :)
setting the network.auth.digest-response-header-username-utf8 pref flag to true (through about:config) enables the new behavior regarding username in the digest authentication response header.
Attachment #734492 - Flags: review+
Attachment #734492 - Flags: feedback+
Attachment #734492 - Flags: checkin+
Comment on attachment 734492 [details] [diff] [review]
Allow utf-8 encoded username in digest authentication response header.

Hi - Thanks for the patch.

Please see
https://developer.mozilla.org/en-US/docs/Developer_Guide/How_to_Submit_a_Patch
for information on how the review flags work. You need to submit r? to someone to get a review. You added r+ which indicates you have given the code a positive review. You can't review your own code.. you also set checkin+ which indicates the patch is ready for checkin - and we can't do that without the right reviews.

So your next step is to request a review.. you probably want to do that from bsmith because he had issues with the prior approach. You should also add to the comment trail in the bug how this effort differs from the last one (or why we should change our minds about the reasons nick and brian have mentioned in the comment trail).

Thanks for the work!
Attachment #734492 - Flags: review+
Attachment #734492 - Flags: feedback+
Attachment #734492 - Flags: checkin+
(In reply to Patrick McManus [:mcmanus] from comment #120)

Hi, thank you for the information :) 
I'll keep your reply in order to remember how to submit a patch correctly.
I will read again the "how to submit a patch" documentation in order to know how to ask for a review.

Olivier.
Comment on attachment 734492 [details] [diff] [review]
Allow utf-8 encoded username in digest authentication response header.

This patch allows the http digest authentication module to send back utf8 encoded username in the response header in order to enable support for non-ascii usernames.
It can be enabled by setting the "network.auth.digest-response-header-username-utf8" flag to true in the FF config (about:config).

the patch could be improved by using the charset parameter as defined in RFC 2831, even if the flag is not set. I can probably do if it is asked.
Attachment #734492 - Flags: review?(bsmith)
Attachment #734492 - Attachment is obsolete: true
Attachment #734492 - Flags: review?(bsmith)
This patch allows the http digest authentication module to send back utf8 encoded username in the response header in order to enable support for non-ascii usernames.
It can be enabled by setting the "network.auth.digest-response-header-username-utf8" flag to true in the FF config (about:config).

the patch could be improved by using the charset parameter as defined in RFC 2831, even if the flag is not set. I can probably do if it is asked.
Attachment #740318 - Flags: review?(bsmith)
Comment on attachment 740318 [details] [diff] [review]
Allow utf-8 encoded username in digest authentication response header.

>diff -r fd264d551130 modules/libpref/src/init/all.js
>--- a/modules/libpref/src/init/all.js	Fri Apr 19 07:45:15 2013 -0400
>+++ b/modules/libpref/src/init/all.js	Mon Apr 22 18:13:28 2013 +0200
>@@ -1280,6 +1280,14 @@
> // Specify if the gss lib comes standard with the OS
> pref("network.negotiate-auth.using-native-gsslib", true);
> 
>+// Controls whether to allow sending back UTF8 username 
>+// or ASCII only characters in the digest authentication response header
>+// False: non-ascii usernames will cause the authentication to fail, 
>+// (which is the default behavior up to now).
>+// True: non-ascii usernames will be sent back UTF-8 encoded in the 
>+// in the digest authentication response header .
>+pref("network.auth.digest-response-header-username-utf8", false);
>+
> #ifdef XP_WIN
> 
> // Default to using the SSPI intead of GSSAPI on windows 
>diff -r fd264d551130 netwerk/protocol/http/nsHttpDigestAuth.cpp
>--- a/netwerk/protocol/http/nsHttpDigestAuth.cpp	Fri Apr 19 07:45:15 2013 -0400
>+++ b/netwerk/protocol/http/nsHttpDigestAuth.cpp	Mon Apr 22 18:13:28 2013 +0200
>@@ -156,6 +156,8 @@
>   return NS_OK;
> }
> 
>+static const char kAllowUTF8UserNameInResponseHeader[] = "network.auth.digest-response-header-username-utf8";
>+
> NS_IMETHODIMP
> nsHttpDigestAuth::GenerateCredentials(nsIHttpAuthenticableChannel *authChannel,
>                                       const char *challenge,
>@@ -178,6 +180,13 @@
>   bool isDigestAuth = !PL_strncasecmp(challenge, "digest ", 7);
>   NS_ENSURE_TRUE(isDigestAuth, NS_ERROR_UNEXPECTED);
> 
>+  // we work with ASCII around here
>+  nsCOMPtr<nsIPrefBranch> prefs = do_GetService(NS_PREFSERVICE_CONTRACTID);
>+  bool allowUTF8UserNameInResponseHeader = false; // Default to the old behavior
>+  if (prefs)
>+    if (NS_FAILED(prefs->GetBoolPref(kAllowUTF8UserNameInResponseHeader, &allowUTF8UserNameInResponseHeader)))
>+      allowUTF8UserNameInResponseHeader = false;
>+
>   // IIS implementation requires extra quotes
>   bool requireExtraQuotes = false;
>   {
>@@ -314,7 +323,10 @@
>   nsAutoCString authString;
> 
>   authString.AssignLiteral("Digest username=");
>-  rv = AppendQuotedString(cUser, authString);
>+  if (allowUTF8UserNameInResponseHeader)
>+    rv = AppendUTF8QuotedString(cUser, authString);
>+  else
>+    rv = AppendQuotedString(cUser, authString);
>   NS_ENSURE_SUCCESS(rv, rv);
> 
>   authString.AppendLiteral(", realm=");
>@@ -688,4 +700,40 @@
>   return NS_OK;
> }
> 
>+nsresult
>+nsHttpDigestAuth::AppendUTF8QuotedString(const nsACString & value,
>+                                     nsACString & aHeaderLine)
>+{
>+  nsAutoCString quoted;
>+  nsACString::const_iterator s, e;
>+  value.BeginReading(s);
>+  value.EndReading(e);
>+
>+  //
>+  // Encode string according to RFC 2616 quoted-string production, 
>+  // but with NON-ascii characters allowed.
>+  // (not a standard, but what other browsers already 
>+  // do to support username with unicode characters)
>+  quoted.Append('"');
>+  for ( ; s != e; ++s) {
>+    //
>+    // CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
>+    //
>+    if (((unsigned char)(*s)) <= 31 || *s == 127) {
>+      return NS_ERROR_FAILURE;
>+    }
>+
>+    // Escape two syntactically significant characters
>+    if (*s == '"' || *s == '\\') {
>+      quoted.Append('\\');
>+    }
>+
>+    quoted.Append(*s);
>+  }
>+  quoted.Append('"');
>+  aHeaderLine.Append(quoted);
>+  return NS_OK;
>+}
>+
>+
> // vim: ts=2 sw=2
>diff -r fd264d551130 netwerk/protocol/http/nsHttpDigestAuth.h
>--- a/netwerk/protocol/http/nsHttpDigestAuth.h	Fri Apr 19 07:45:15 2013 -0400
>+++ b/netwerk/protocol/http/nsHttpDigestAuth.h	Mon Apr 22 18:13:28 2013 +0200
>@@ -79,6 +79,11 @@
>     // append the quoted version of value to aHeaderLine
>     nsresult AppendQuotedString(const nsACString & value,
>                                 nsACString & aHeaderLine);
>+    // append the quoted version of value to aHeaderLine, 
>+    // with bytes values from 128 to 255 allowed 
>+    // (to be able to send back the utf8 encoded username)
>+    nsresult AppendUTF8QuotedString(const nsACString & value,
>+                                    nsACString & aHeaderLine);
> 
>   protected:
>     nsCOMPtr<nsICryptoHash>        mVerifier;
Attachment #740318 - Attachment is patch: true
[Ack! Not sure what I fubared to trigger the last comment, I was just setting the ispatch flag...]

I think a global pref is a really bad idea. It makes the browser behave differently to the server in an undetectable way that breaks the spec, isn't a very good option if the user needs to login to a different site which needs the opposite behavior, and all-around feels like a big footgun that can lead to the perception that Firefox is broken (perhaps long after the user has flipped the pref and forgotten about it).

Nor does it really address the problem for the vast majority of users, because we can't expect many people to be using this pref (most people don't change defaults, and also the above).
ggo98: Thank you very much for working on this.

I agree with Dolske regarding the non-value of having a pref. This kind of pref tends to make one website work while another website breaks.

Our goals should be to (a) work as well as we can with existing servers, and (b) push the web forward towards a reasonable way to use UTF-8 for HTTP Auth. For (b) it seems like the new parameter-based mechanism Julian suggested above, tracked in bug 656213, is the way to go there by default.

For (a), perhaps a not-completely-and-totally-horrible solution would be to modify the UI of the HTTP auth prompt so that the user can override the choice of encoding, just like we allow the user to override the choice of encoding in the good ol' charset menu. Is it bad UI? Well, it isn't great UI. However, saying that European and American people can use passwords in their languages, but others can't, is a horrible (at best) policy. And, HTTP basic/digest auth is rarely used, so few users would be exposed to the less-than-optimal-but-important UI.

Even if we were to implement (b), we will still need (a) to cope with servers that don't implement that new mechanism. And, I don't think it matters whether we implement (a) before (b) or vice-versa.

I doubt any Mozilla Corp. employees are going to sit down and fix this problem any time soon because of the other things we're busy with. But, luckily, we have some people who have stepped up to write some code to solve this problem. The pref is not the way to do that, but we should steer people towards a solution that is acceptable. So, here's my non-UX-designer suggested UI:

    Username: [                   ]
    Password: [                   ]
    Encoding: [                ][v]

"Encoding" is a drop-down list of encodings. It is disabled by default and its default value is "Automatic (ASCII)" which means ASCII. If the user types a non-ASCII username or a non-ASCII password, then the Encoding drop-down gets enabled and the user can see a list of encodings that looks similar to the good ol' charset menu, sorted and separated in the same way, which they can choose from. When the user chooses a non-ASCII encoding, then we send the auth data using that encoding, exactly like MSIE encodes it.

Bonus points (perhaps in a follow-up bug): filter the list of encodings in the Encodings list based on the encodings that include the non-ASCII characters that the user typed.

If we ever implement the mechanism tracked by bug 656213, or a similar mechanism, then we will disable the Encodings field when the server uses that mechanism, and set its value to the value that the server chose.

dolske: Does this seem like a reasonable approach, in general, to you? Is there somebody on Firefox or UX team that needs to be involved before we would suggest potential contributors take a go at implementing it, for us to review?

Does anybody else see any tragic flaws with changing the UI in this way, or a similar manner? Anybody got a better idea?
Flags: needinfo?(dolske)
Comment on attachment 740318 [details] [diff] [review]
Allow utf-8 encoded username in digest authentication response header.

Review of attachment 740318 [details] [diff] [review]:
-----------------------------------------------------------------

ggo, Thanks for the patch. I am going to r- it, based on the reasons given above. If you disagree and think that I should take a look at this patch, please respond to the comments that dolske and I made above and r?bsmith again. Thanks again!
Attachment #740318 - Flags: review?(bsmith) → review-
Asking the user to deal with encoding issues in UI is just making things worse. :(

TBH, the only path suggested so far that sounds likely to fly is specing a new auth method (like comment 59) that's explicitly utf8, and getting sites to use that. Everything else breaks stuff to some unknown degree.

Someone just needs to do that. It sounds straightforward, and should be pretty simple for servers and clients to implement.
Flags: needinfo?(dolske)
(In reply to Brian Smith (:bsmith) from comment #126)
>     Encoding: [                ][v]

My half-cent: should be hidden until we detect non-ASCII chars, or until we get 401/7.  Explanatory text would be good to show then as well or just tell the user "we are going to try it other way if you failed to auth".  Persisting the option or found encoding is probably a must.

So, not just few lines of code anyway...
Thank all for your comments:)
I'll try to work on this again if necessary. (unfortunately I don't have time these days..., so I'll "Reset Assignee to default" if I have the right to do it).
(In reply to Brian Smith (:bsmith) from comment #127)

Hi Brian,

I don't disagree with the "r-".
I understand the patch is not perfect, and that "pref" is not the best way, for the explained reasons (people usually don't play with prefs)
The only point is that I don't understand what it breaks something , because:
1) without the pref set, ASCII chars only are supported (iso8859-1 chars are currently rejected, at least with the code I know : FF 19 & 20)
2) with the pref set (which may be asked/acceptable under certain circumstances), ASCII chars are supported the same way as they are without the pref set, and chars out of the ASCII charset are too... but it's true that this way is not compliant with a standard method.
The most standard way I think would be to check the "charset" authentication header (already suggested by others and me) in order to check if the Web server supports the UTF8 charset, as specified in RFC2831:
"charset
      This directive, if present, specifies that the server supports
      UTF-8 encoding for the username and password. If not present, the
      username and password must be encoded in ISO 8859-1 (of which
      US-ASCII is a subset). The directive is needed for backwards
      compatibility with HTTP Digest, which only supports ISO 8859-1.
      This directive may appear at most once; if multiple instances are
      present, the client should abort the authentication exchange.
"

Thank you,
Olivier.




> Comment on attachment 740318 [details] [diff] [review]
> Allow utf-8 encoded username in digest authentication response header.
> 
> Review of attachment 740318 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ggo, Thanks for the patch. I am going to r- it, based on the reasons given
> above. If you disagree and think that I should take a look at this patch,
> please respond to the comments that dolske and I made above and r?bsmith
> again. Thanks again!
Assignee: ggo98 → nobody
(In reply to ggo from comment #131)
> ...
> The only point is that I don't understand what it breaks something , because:
> 1) without the pref set, ASCII chars only are supported (iso8859-1 chars are
> currently rejected, at least with the code I know : FF 19 & 20)

As far as I understand, FF currently uses ISO-8859-1. Are you saying it does not?

> 2) with the pref set (which may be asked/acceptable under certain
> circumstances), ASCII chars are supported the same way as they are without
> the pref set, and chars out of the ASCII charset are too... but it's true
> that this way is not compliant with a standard method.

Changing the default from ISO-8859-1 to UTF-8 will break those sites that rely on it being ISO-8859-1.

> The most standard way I think would be to check the "charset" authentication
> header (already suggested by others and me) in order to check if the Web
> server supports the UTF8 charset, as specified in RFC2831:

That parameter does not apply to Basic and/or Digest. That being said, you may want to help fixing the issue in the specs. See <http://greenbytes.de/tech/webdav/draft-ietf-httpauth-basicauth-enc-latest.html>.

> ...
(In reply to Julian Reschke from comment #132)
> (In reply to ggo from comment #131)
> > ...
> > The only point is that I don't understand what it breaks something , because:
> > 1) without the pref set, ASCII chars only are supported (iso8859-1 chars are
> > currently rejected, at least with the code I know : FF 19 & 20)
> 
> As far as I understand, FF currently uses ISO-8859-1. Are you saying it does
> not?

Yes, that's what I'm saying, for Digest Authentication.
see nsHttpDigestAuth::AppendQuotedString() in netwerk/protocol/http/nsHttpDigestAuth.cpp:

.
.

    //
    // CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
    //
    if (*s <= 31 || *s == 127) {
      return NS_ERROR_FAILURE;
    }
.
.


> 
> > 2) with the pref set (which may be asked/acceptable under certain
> > circumstances), ASCII chars are supported the same way as they are without
> > the pref set, and chars out of the ASCII charset are too... but it's true
> > that this way is not compliant with a standard method.
> 
> Changing the default from ISO-8859-1 to UTF-8 will break those sites that
> rely on it being ISO-8859-1.

Agree with that. Except that according to the code and some tests I have done, it seems that it does not work with ISO-8859-1 at this time.

> 
> > The most standard way I think would be to check the "charset" authentication
> > header (already suggested by others and me) in order to check if the Web
> > server supports the UTF8 charset, as specified in RFC2831:
> 
> That parameter does not apply to Basic and/or Digest. That being said, you
> may want to help fixing the issue in the specs. See
> <http://greenbytes.de/tech/webdav/draft-ietf-httpauth-basicauth-enc-latest.
> html>.

Right, it does not apply to basic authentication.
However if I understand correctly, RFC2831 (which talks about "HTTP Digest Access
   Authentication") seems to say the "charset" parameter could be used...


> 
> > ...

Thank you,
Olivier.
(In reply to ggo from comment #133)
> (In reply to Julian Reschke from comment #132)
> > (In reply to ggo from comment #131)
> > > ...
> > > The only point is that I don't understand what it breaks something , because:
> > > 1) without the pref set, ASCII chars only are supported (iso8859-1 chars are
> > > currently rejected, at least with the code I know : FF 19 & 20)
> > 
> > As far as I understand, FF currently uses ISO-8859-1. Are you saying it does
> > not?
> 
> Yes, that's what I'm saying, for Digest Authentication.
> see nsHttpDigestAuth::AppendQuotedString() in
> netwerk/protocol/http/nsHttpDigestAuth.cpp:
> 
> .
> .
> 
>     //
>     // CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
>     //
>     if (*s <= 31 || *s == 127) {
>       return NS_ERROR_FAILURE;
>     }
> .
> .

What does rejecting control characters have to do with ISO-8859-1 support?

> > > 2) with the pref set (which may be asked/acceptable under certain
> > > circumstances), ASCII chars are supported the same way as they are without
> > > the pref set, and chars out of the ASCII charset are too... but it's true
> > > that this way is not compliant with a standard method.
> > 
> > Changing the default from ISO-8859-1 to UTF-8 will break those sites that
> > rely on it being ISO-8859-1.
> 
> Agree with that. Except that according to the code and some tests I have
> done, it seems that it does not work with ISO-8859-1 at this time.

Then it's probably time to write tests first.

> > > The most standard way I think would be to check the "charset" authentication
> > > header (already suggested by others and me) in order to check if the Web
> > > server supports the UTF8 charset, as specified in T:
> > 
> > That parameter does not apply to Basic and/or Digest. That being said, you
> > may want to help fixing the issue in the specs. See
> > <http://greenbytes.de/tech/webdav/draft-ietf-httpauth-basicauth-enc-latest.
> > html>.
> 
> Right, it does not apply to basic authentication.
> However if I understand correctly, RFC2831 (which talks about "HTTP Digest
> Access
>    Authentication") seems to say the "charset" parameter could be used...

But that RFC is not about the Digest HTTP auth mechanism.

Yes, the HTTPAUTH WG might end up adopting this approach for HTTP as well, but we're not there yet.

That being said, *if* we use parameters, we'll have first to fix the code that parses the WWW-Authenticate header field. See test cases at <http://greenbytes.de/tech/tc/httpauth/>.
(In reply to Julian Reschke from comment #134)
> (In reply to ggo from comment #133)
> > (In reply to Julian Reschke from comment #132)
> > > (In reply to ggo from comment #131)
> > > > ...
> > > > The only point is that I don't understand what it breaks something , because:
> > > > 1) without the pref set, ASCII chars only are supported (iso8859-1 chars are
> > > > currently rejected, at least with the code I know : FF 19 & 20)
> > > 
> > > As far as I understand, FF currently uses ISO-8859-1. Are you saying it does
> > > not?
> > 
> > Yes, that's what I'm saying, for Digest Authentication.
> > see nsHttpDigestAuth::AppendQuotedString() in
> > netwerk/protocol/http/nsHttpDigestAuth.cpp:
> > 
> > .
> > .
> > 
> >     //
> >     // CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
> >     //
> >     if (*s <= 31 || *s == 127) {
> >       return NS_ERROR_FAILURE;
> >     }
> > .
> > .
> 
> What does rejecting control characters have to do with ISO-8859-1 support?

you're right, this is what this code is supposed to do. But since s is a "nsACString" (= some chars), and since char is signed (at least by default), the "*s <= 31" test is always true for any char which value would be >= 128 if it was stored in an unsigned char.  So in the end, the only allowed char values are > to 31 and < to 127.
(*** at least in the FF version 20 code I have, may be it has been changed in FF 21; did not check yet...)

> 
> > > > 2) with the pref set (which may be asked/acceptable under certain
> > > > circumstances), ASCII chars are supported the same way as they are without
> > > > the pref set, and chars out of the ASCII charset are too... but it's true
> > > > that this way is not compliant with a standard method.
> > > 
> > > Changing the default from ISO-8859-1 to UTF-8 will break those sites that
> > > rely on it being ISO-8859-1.
> > 
> > Agree with that. Except that according to the code and some tests I have
> > done, it seems that it does not work with ISO-8859-1 at this time.
> 
> Then it's probably time to write tests first.

probably yes.

> 
> > > > The most standard way I think would be to check the "charset" authentication
> > > > header (already suggested by others and me) in order to check if the Web
> > > > server supports the UTF8 charset, as specified in T:
> > > 
> > > That parameter does not apply to Basic and/or Digest. That being said, you
> > > may want to help fixing the issue in the specs. See
> > > <http://greenbytes.de/tech/webdav/draft-ietf-httpauth-basicauth-enc-latest.
> > > html>.
> > 
> > Right, it does not apply to basic authentication.
> > However if I understand correctly, RFC2831 (which talks about "HTTP Digest
> > Access
> >    Authentication") seems to say the "charset" parameter could be used...
> 
> But that RFC is not about the Digest HTTP auth mechanism.
OK.

> 
> Yes, the HTTPAUTH WG might end up adopting this approach for HTTP as well,
> but we're not there yet.
> 
> That being said, *if* we use parameters, we'll have first to fix the code
> that parses the WWW-Authenticate header field. See test cases at
> <http://greenbytes.de/tech/tc/httpauth/>.

OK.

Thanks for the feedback :)
This is incompatible with Opera and Chromium(they just send the username as byte utf8 without transport encoding), but it's like here http://tools.ietf.org/html/draft-ietf-httpauth-digest-15. Despite the fact that draft-ietf-httpauth-digest-15 this is a draft only, he already will not change significantly, at least charset can still use it.
Attachment #8587817 - Flags: review?(brian)
Attachment #8587817 - Flags: review?(brian) → review?(mcmanus)
Comment on attachment 8587817 [details] [diff] [review]
to do here as draft-ietf-httpauth-digest-15

Review of attachment 8587817 [details] [diff] [review]:
-----------------------------------------------------------------

honza, I think you have the most domain knowledge here and have shown an interest in 656213.. but let me know if the review should be rerouted
Attachment #8587817 - Flags: review?(mcmanus) → review?(honzab.moz)
I'll try to get to this within few days.
Assignee: nobody → 8f7pfg
Status: NEW → ASSIGNED
FYI: review is in progress!
Review draft lost!!!!  I have to start all over again!
Comment on attachment 8589323 [details] [diff] [review]
to do here as draft-ietf-httpauth-digest-15 (refreshed as it should be)

Review of attachment 8589323 [details] [diff] [review]:
-----------------------------------------------------------------

There is a plenty of formatting issues.  The original patch was not submitted with 8 line context and function names.  Please read https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style and maybe also https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/How_to_Submit_a_Patch

::: netwerk/protocol/http/nsHttpBasicAuth.cpp
@@ +75,3 @@
>      nsAutoCString userpass;
> +    nsAutoCString charset;
> +    if(strstr(challenge,"charset=")){

please don't use strstr or any of CRT string functions.  Use mozilla::Tokenizer (see bug 1024056).  The challenge header should be parsed correctly.

@@ +75,4 @@
>      nsAutoCString userpass;
> +    nsAutoCString charset;
> +    if(strstr(challenge,"charset=")){
> +	const char *p = strstr(challenge,"charset=") + 8;

this is very unsafe...

@@ +95,5 @@
> +	if (password)
> +		AppendUTF16toUTF8(password, userpass);
> +    }
> +    else{
> +	//actually it is ISO-8859-1

if charset == "" (or there is no "charset" option in the challenge), otherwise on unknown values we should fail the authentication.  Rather be safe.

::: netwerk/protocol/http/nsHttpDigestAuth.cpp
@@ -61,2 @@
>    if (NS_FAILED(rv)) return rv;
> -

please leave the blank lines that make the code more readable.  and add some to your new code as well.

@@ -204,5 @@
>    if (NS_FAILED(rv)) {
>      LOG(("nsHttpDigestAuth::GenerateCredentials [ParseChallenge failed rv=%x]\n", rv));
>      return rv;
>    }
> -

again, why are you unreasonably removing blank lines?

@@ +302,5 @@
> +  else{
> +	//this true lossy convert UTF16 to ASCII
> +	//ASCII is an 7-bit encoding
> +    const char16_t *p = username;
> +		cUser.Assign(*p % 128);

you really can't do this...  who is using real 7bit ascii these days?  the correct way is escaping and it's probably not what you want here, or yes?

TODO for me: check the spec

@@ +338,5 @@
> +  if(userhash == 2){
> +	  char hashuser[EXPANDED_DIGEST_SHA256_LENGTH+1];
> +    	  cUser.Append(":");
> +    	  cUser.Append(realm);
> +	  aHash(cUser.get(), cUser.Length(), algorithm);

check result

@@ +345,5 @@
> +	  authString += hashuser;
> +	  authString += '\"';
> +	  goto appendRealm;
> +  }
> +  if(charset.EqualsLiteral("ISO-8859-1") || charset.EqualsLiteral("UTF-8")){

please use else if () instead of a goto here.

@@ +353,5 @@
> +	  else{
> +		authString.AssignLiteral("Digest username*=UTF-8\'\'");
> +	  }
> +	  nsAutoCString escUser;
> +	  NS_Escape(cUser, escUser,  url_XAlphas);

TODO for me: check this escaping

@@ +453,5 @@
> +	  len = 2*EXPANDED_DIGEST_SHA256_LENGTH + nonce.Length() + 2;
> +  }
> +  else{
> +	  len = 2*EXPANDED_DIGEST_LENGTH + nonce.Length() + 2;
> +  }

please have functions DigestLength(algorithm) and ExpadedDigestLength(algorithm).  It will make the code much more clean and clear on many places.  When |algorithm| is not one of the expected make those function MOZ_ASSERT(false);

This particular lines should then be:
uint32_t len = nonce.Length() + 2 + ExpadedDigestLength(algorithm) * 2;

@@ +454,5 @@
> +  }
> +  else{
> +	  len = 2*EXPANDED_DIGEST_LENGTH + nonce.Length() + 2;
> +  }
> +  if (qop == QOP_AUTH || qop == QOP_AUTH_INT) {

TODO for me: check this condition

@@ -468,2 @@
>    nsAutoCString contents;
> -  contents.SetCapacity(len + 1);

all these calculations are perf optimizations.  but definitely not critical.

::: netwerk/protocol/http/nsHttpDigestAuth.h
@@ +84,3 @@
>  
>      // result is in mHashBuf
> +    nsresult aHash(const char *buf, uint32_t len, uint16_t algorithm);

Hash or DoHash
Attachment #8589323 - Flags: review-
Comment on attachment 8587817 [details] [diff] [review]
to do here as draft-ietf-httpauth-digest-15

Please see review comments in comment 145.  I'm really sorry for such a huge delay.  This patch simply got lost among all my other reviews and more priority work.  I'll do better next time.
Attachment #8587817 - Flags: review?(honzab.moz) → review-
Whiteboard: [necko-would-take]
About it:

> +  else{
> +	//this true lossy convert UTF16 to ASCII
> +	//ASCII is an 7-bit encoding
> +    const char16_t *p = username;
> +		cUser.Assign(*p % 128);

you really can't do this...  who is using real 7bit ascii these days?  the correct way is escaping and it's probably not what you want here, or yes?

I can not do otherwise. For escaping required "charset" to use extended notation username*=charset''username, without it required to use quoted string ascii or do not use at all ABNF.
Attachment #8714628 - Flags: review?(honzab.moz)
First of all, the internet draft is RFC 7616 (for digest auth)/7617 (for basic auth) now.

(In reply to for nothing from comment #147)
> I can not do otherwise. For escaping required "charset" to use extended
> notation username*=charset''username, without it required to use quoted
> string ascii or do not use at all ABNF.

You don't have to care about charset other than UTF-8. UTF-8 is the only valid value per RFC 7616. For all other charset values (or servers that do not send the charset parameter), just use |username="<username without escape>"| as before (that is, the RFC 2617 compatible header). The quoted-string production can have non-ASCII octets.

https://tools.ietf.org/html/rfc7230#section-3.2.6
>     quoted-string  = DQUOTE *( qdtext / quoted-pair ) DQUOTE
>     qdtext         = HTAB / SP /%x21 / %x23-5B / %x5D-7E / obs-text
>     obs-text       = %x80-FF

Although it is obsolete, RFC 2617 compatible header is obsolete by definition. It will be needed for compatibility with old servers. RFC 7616-unaware servers are unlikely to support "username*" not "userhash".
Comment on attachment 8714628 [details] [diff] [review]
to do here as draft-ietf-httpauth-digest

The patch is malformed on binary level and cannot be applied.  Please submit a result of hg diff -U 8 -p (or qdiff).
Attachment #8714628 - Flags: review?(honzab.moz)
Attached patch to do here as rfc 7616 (obsolete) — Splinter Review
Attachment #8587817 - Attachment is obsolete: true
Attachment #8714628 - Attachment is obsolete: true
Attachment #8717027 - Flags: review?(honzab.moz)
Comment on attachment 8717027 [details] [diff] [review]
to do here as rfc 7616

Review of attachment 8717027 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the update and my apologies for taking that long to respond.

I'll ask you to do one major thing: please split this patch to two pieces extra for each type of authentication.  One for basic auth, which I believe will be easier to bring to the final state, and second for digest auth.  This will make progress on this bug faster.  Thanks.

::: netwerk/protocol/http/nsHttpBasicAuth.cpp
@@ +78,5 @@
> +    nsAutoCString charset;
> +    Tokenizer p(challenge);
> +    Tokenizer::Token t;
> +    while (p.Next(t)) {
> +      if (t.AsString() == "charset" && p.Next(t) && (t.AsChar() == '=')) {

Note: p.Next(t) && (t.AsChar() == '=') can better be written as p.CheckChar('=')

@@ +80,5 @@
> +    Tokenizer::Token t;
> +    while (p.Next(t)) {
> +      if (t.AsString() == "charset" && p.Next(t) && (t.AsChar() == '=')) {
> +        p.Record();
> +        while (p.Next(t) && !t.Equals(Tokenizer::Token::Char(',')));

Note: now we have a convenient ReadUntil method

@@ +84,5 @@
> +        while (p.Next(t) && !t.Equals(Tokenizer::Token::Char(',')));
> +        p.Claim(charset);
> +        charset.StripChar('"', 0);
> +      }
> +    }

You must respect the grammar here.  This (if your code were written correctly, by the way) will take any occurrence of "charset" within the string.

The grammar is:

Schema header="value"[,header="value"]*

Also allows white spaces (including cr/lf) between each element.  I think we are OK with having just a necessary subset of https://tools.ietf.org/html/rfc5234#section-4

Roughly looking at the spec, seems like header is what Tokenizer recognizes as "word" with added "-" as word char (see the constructor).  value seems to be anything between the quotes delimited by ','.  Not sure how quotes are escaped.

You will need to write a better loop for it or even a simple recursive descent (optionally by deriving from Tokenizer).  You have to walk the headers one by one and pick charset value.

@@ +91,5 @@
> +      userpass.Append(':');
> +      if (password) {
> +        AppendUTF16toUTF8(password, userpass);
> +      }
> +    } else {

we must fail when charset is found non-null and something else than "UTF-8".  The spec seems to allow only that value.

::: netwerk/protocol/http/nsHttpDigestAuth.cpp
@@ +212,5 @@
>  
> +  char ha1_digest[EXPANDED_DIGEST_SHA256_LENGTH+1];
> +  char ha2_digest[EXPANDED_DIGEST_SHA256_LENGTH+1];
> +  char response_digest[EXPANDED_DIGEST_SHA256_LENGTH+1];
> +  char upload_data_digest[EXPANDED_DIGEST_SHA256_LENGTH+1];

I'd slightly more prefer to have LONGEST_DIGEST_LENGTH define that will simply defined as the longest of the defined hash lengths, so that when new one is added, we simply redefine it (when needed) and don't crash.

@@ +319,5 @@
> +    for (;*p != 0;p++) {
> +      if (*p % 128 >= 32) {
> +        cUser.Append(*p % 128);
> +      }
> +    }

can you please explain what you are doing here?  we might already have some helper functions for this (if I knew what's going on).

@@ +358,5 @@
>    //
>  
>    nsAutoCString authString;
>  
> +  if (userhash == 2) {

please have defines for possible values of |userhash| to make it clear what is happening here.

@@ +374,5 @@
> +    if (charset.EqualsLiteral("ISO-8859-1")) {
> +      authString.AssignLiteral("Digest username*=ISO-8859-1\'\'");
> +    } else {
> +      authString.AssignLiteral("Digest username*=UTF-8\'\'");
> +    }

maybe: 

authString.AssignLiteral("Digest username*=")
authString.Append(charset);

?

@@ +405,2 @@
>    } else {
> +    authString.AppendLiteral("MD5");

We need to know that we don't understand the algorithm.  But here for unknown values you simply fallback to MD5, which is default only when the "algorithm" header is not present at all.  That is the whole purpose of the ALGO_SPECIFIED flag, which your patch effectively removes.

@@ +722,5 @@
>      else if (nameLength == 9 &&
>          nsCRT::strncasecmp(challenge+nameStart, "algorithm", 9) == 0)
>      {
>        // we want to clear the default, so we use = not |= here
>        *algorithm = ALGO_SPECIFIED;

with this usage, this should be renamed to ALGO_SPECIFIED_UNKNOWN

@@ +739,3 @@
>        else if (valueLength == 8 &&
> +               nsCRT::strncasecmp(challenge+valueStart, "MD5-sess", 8) == 0) {
> +        *algorithm = ALGO_MD5_SESS;

should we pick the strongest algo when this header is present more then once?  or should we ignore the challenge completely when present more than once?

::: netwerk/protocol/http/nsHttpDigestAuth.h
@@ +24,4 @@
>  #define QOP_AUTH 0x01
>  #define QOP_AUTH_INT 0x02
>  
>  #define DIGEST_LENGTH 16

rename to DIGEST_MD5_LENGTH

@@ +91,5 @@
>      nsresult AppendQuotedString(const nsACString & value,
>                                  nsACString & aHeaderLine);
>  
> +    int16_t DigestLength(int16_t algorithm);
> +    int16_t ExpadedDigestLength(int16_t algorithm);

Expanded? (typo)
Attachment #8717027 - Flags: review?(honzab.moz) → review-
Attached patch patch for basic auth (obsolete) — Splinter Review
I do not know of bugs should correct patch for basic authentication, leave here.
Attachment #8717027 - Attachment is obsolete: true
Attachment #8750616 - Flags: review?(honzab.moz)
Attached patch patch for digest auth (obsolete) — Splinter Review
This for digest auth
Attachment #8750619 - Flags: review?(honzab.moz)
Comment on attachment 8750616 [details] [diff] [review]
patch for basic auth

Review of attachment 8750616 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpBasicAuth.cpp
@@ +78,5 @@
> +    nsAutoCString charset;
> +    Tokenizer p(challenge);
> +    Tokenizer::Token t;
> +    while (p.Next(t)) {
> +      if (t.AsString() == "charset" && p.CheckChar('=')) {

This is still not a grammar-like parsing loop, sorry.  Also, doing t.AsString() on a returned token w/o checking its type first is wrong.  The correct way to check it's a word and has an expected content is to do:

t.Equals(Tokenizer::Token::Word("charset"))

which does the correct type and value checks for you.  Just read the comments in https://mxr.mozilla.org/mozilla-central/source/xpcom/ds/Tokenizer.h to more understand how it works.

@@ +91,5 @@
> +    }
> +    if (!charset.IsEmpty()) {
> +      if (charset.EqualsLiteral("UTF-8")) {
> +        CopyUTF16toUTF8(user, userpass);
> +        userpass.Append(':');

please leave the "// always send a ':' (see bug 129565)" comment
Attachment #8750616 - Flags: review?(honzab.moz) → review-
Attachment #521605 - Attachment is obsolete: true
Attachment #740318 - Attachment is obsolete: true
Attachment #8589323 - Attachment is obsolete: true
Attachment #8750616 - Attachment is obsolete: true
Comment on attachment 8750616 [details] [diff] [review]
patch for basic auth

ups.
Attachment #8750616 - Attachment is obsolete: false
Comment on attachment 8750619 [details] [diff] [review]
patch for digest auth

Review of attachment 8750619 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.  This looks mostly well, just few formatting and details to be updated.

Other thing we need is a test.  Probably just expand https://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/test_authentication.js to also exercise the sha256 code.  the more corners of your added/modified code you cover the better - ideally - all.

::: netwerk/protocol/http/nsHttpDigestAuth.cpp
@@ +59,5 @@
>    }
>  
> +  if (algorithm & ALGO_SHA256 || algorithm & ALGO_SHA256_SESS) {
> +    rv = mVerifier->Init(nsICryptoHash::SHA256);
> +  } else {

for safety, you should check for the ALGO_MD5* flags here as well.  and NS_ERROR("Algorithm not set")+return NS_ERROR_UNEXPECTED when none of the flags is found.

@@ +296,5 @@
>    // calculate credentials
>    //
>  
> +  if (!charset.IsEmpty()) {
> +    if (!charset.EqualsLiteral("UTF-8")) {

if (!charset.IsEmpty() && !charset.EqualsLiteral("UTF-8"))

@@ +324,5 @@
>    //
>  
>    nsAutoCString authString;
>  
> +  if (userhash & USERHASH_TRUE) {

just == ?

@@ +329,5 @@
> +    char hashuser[EXPANDED_DIGEST_LENGTH+1];
> +    cUser.Append(":");
> +    cUser.Append(realm);
> +    rv = DoHash(cUser.get(), cUser.Length(), algorithm);
> +    if (NS_FAILED(rv)) return rv;

if (NS_FAILED(rv)) {
  return rv;
}

or better

NS_ENSURE_SUCCESS(rv, rv);

@@ +334,5 @@
> +    ExpandToHex(mHashBuf, algorithm, hashuser);
> +    authString.AssignLiteral("Digest username=\"");
> +    authString += hashuser;
> +    authString += '\"';
> +  } else if (charset.EqualsLiteral("UTF-8")) {

would be nice to somehow cache this test

@@ +339,5 @@
> +    authString.AssignLiteral("Digest username*=");
> +    authString.Append(charset);
> +    authString.AppendLiteral("\'\'");
> +    nsAutoCString escUser;
> +    NS_Escape(cUser, escUser,  url_XAlphas);

two spaces after the second comma

@@ +400,5 @@
>    }
>  
> +  if (userhash) {
> +    authString.AppendLiteral(", userhash=");
> +    if (userhash & USERHASH_TRUE) {

userhash == USERHASH_TRUE ?  I didn't really mean these should be flags, just named constants to make more clear what's going on.  they should not be combined.

@@ +475,3 @@
>  {
> +  int16_t index, value, digestLen;
> +  digestLen = DigestLength(algorithm);

nit, maybe just:

int16_t index, value, digestLen = DigestLength(algorithm);

@@ +489,5 @@
>      else
>        result[(index*2)+1] = value - 10 + 'a';
>    }
>  
> +  result[digestLen * 2] = 0;

ExpandedDigestLength(algorithm) instead of digestLen * 2 please

@@ +601,4 @@
>    *stale = false;
>    *algorithm = ALGO_MD5; // default is MD5
>    *qop = 0;
> +  uint16_t algo = 0;

if you really need this, please declare before first use

@@ +668,5 @@
> +    else if (nameLength == 7 &&
> +	      nsCRT::strncasecmp(challenge+nameStart, "charset", 7) == 0)
> +    {
> +      if (!charset.EqualsLiteral("UTF-8")) {
> +        charset.Assign(challenge+valueStart, valueLength);

please explain what the condition exactly means.  are you just ignoring everything but UTF-8?  a comment is appropriate here.

@@ +713,5 @@
> +        algo = ALGO_MD5_SESS;
> +      }
> +      if (*algorithm < algo){
> +      *algorithm |= algo;
> +      }

isn't it OK to just always do *algorithm |= algo;?  we select strongest algo later in GenerateCredentials anyway.

nit: indention and spaces.

@@ +779,5 @@
> +nsHttpDigestAuth::DigestLength(int16_t algorithm)
> +{
> +  MOZ_ASSERT(algorithm >= ALGO_SPECIFIED && algorithm <= ALGO_SHA256_SESS + 1);
> +  int16_t len;
> +  if (algorithm & ALGO_SHA256 || algorithm & ALGO_SHA256_SESS) {

nit: algorithm & (ALGO_SHA256 || ALGO_SHA256_SESS)

::: netwerk/protocol/http/nsHttpDigestAuth.h
@@ +30,3 @@
>  #define NONCE_COUNT_LENGTH 8
> +#define USERHASH_FALSE 0x01
> +#define USERHASH_TRUE 0x02

We may also want USERHASH_UNKNOWN or _NOT_SET defined as 0 which would be the default value.
Attachment #8750619 - Flags: review?(honzab.moz) → feedback+
Attached patch patch for digest (obsolete) — Splinter Review
Sorry for the delay.
Attachment #8750616 - Attachment is obsolete: true
Attachment #8750619 - Attachment is obsolete: true
Attachment #8821682 - Flags: review?(bzbarsky)
Attached patch patch for test digest (obsolete) — Splinter Review
Attachment #8821683 - Flags: review?(bzbarsky)
> This is still not a grammar-like parsing loop, sorry.

Still not sure that I understood correctly. Need something like this?

NS_IMETHODIMP
nsHttpBasicAuth::ParseCharset(const char *challenge, nsACString &charset)
{
  const char *p = challenge + 5;
  while () {
    while (*p && (*p == ',' || nsCRT::IsAsciiSpace(*p))) {
      ++p;
    }
    if (!*p) {
      break;
    }
    int32_t nameStart = (p - challenge);
    while (*p && !nsCRT::IsAsciiSpace(*p) && *p != '=') {
      ++p;
    }
    if (!*p) {
      return NS_ERROR_INVALID_ARG;
    }
    int32_t nameLength = (p - challenge) - nameStart;
    while (*p && (nsCRT::IsAsciiSpace(*p) || *p == '=')) {
      ++p;
    }
    if (!*p) {
      return NS_ERROR_INVALID_ARG;
    }
    bool quoted = false;
    if (*p == '"') {
      ++p;
      quoted = true;
    }
    int32_t valueStart = (p - challenge);
    int32_t valueLength = 0;
    if (quoted) {
      while (*p && *p != '"') {
        ++p;
      }
      if (*p != '"'){
        return NS_ERROR_INVALID_ARG;
      }
      valueLength = (p - challenge) - valueStart;
      ++p;
    } else {
      while (*p && !nsCRT::IsAsciiSpace(*p) && *p != ','){
        ++p;
      }
      valueLength = (p - challenge) - valueStart;
    }
    if (nameLength == 7 &&
        nsCRT::strncasecmp(challenge+nameStart, "charset", 7) == 0)
    {
      charset.Assign(challenge+valueStart, valueLength);
    }
  }
}

only with Tokenizer?(In reply to Honza Bambas (:mayhemer) [off till Jan 4] from comment #154)
> Comment on attachment 8750616 [details] [diff] [review]
> patch for basic auth
> 
> Review of attachment 8750616 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/protocol/http/nsHttpBasicAuth.cpp
> @@ +78,5 @@
> > +    nsAutoCString charset;
> > +    Tokenizer p(challenge);
> > +    Tokenizer::Token t;
> > +    while (p.Next(t)) {
> > +      if (t.AsString() == "charset" && p.CheckChar('=')) {
> 
> This is still not a grammar-like parsing loop, sorry.  Also, doing
> t.AsString() on a returned token w/o checking its type first is wrong.  The
> correct way to check it's a word and has an expected content is to do:
> 
> t.Equals(Tokenizer::Token::Word("charset"))
> 
> which does the correct type and value checks for you.  Just read the
> comments in
> https://mxr.mozilla.org/mozilla-central/source/xpcom/ds/Tokenizer.h to more
> understand how it works.
> 
> @@ +91,5 @@
> > +    }
> > +    if (!charset.IsEmpty()) {
> > +      if (charset.EqualsLiteral("UTF-8")) {
> > +        CopyUTF16toUTF8(user, userpass);
> > +        userpass.Append(':');
> 
> please leave the "// always send a ':' (see bug 129565)" comment

(In reply to Honza Bambas (:mayhemer) [off till Jan 4] from comment #154)
> Comment on attachment 8750616 [details] [diff] [review]
> patch for basic auth
> 
> Review of attachment 8750616 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/protocol/http/nsHttpBasicAuth.cpp
> @@ +78,5 @@
> > +    nsAutoCString charset;
> > +    Tokenizer p(challenge);
> > +    Tokenizer::Token t;
> > +    while (p.Next(t)) {
> > +      if (t.AsString() == "charset" && p.CheckChar('=')) {
> 
> This is still not a grammar-like parsing loop, sorry.  Also, doing
> t.AsString() on a returned token w/o checking its type first is wrong.  The
> correct way to check it's a word and has an expected content is to do:
> 
> t.Equals(Tokenizer::Token::Word("charset"))
> 
> which does the correct type and value checks for you.  Just read the
> comments in
> https://mxr.mozilla.org/mozilla-central/source/xpcom/ds/Tokenizer.h to more
> understand how it works.
> 
> @@ +91,5 @@
> > +    }
> > +    if (!charset.IsEmpty()) {
> > +      if (charset.EqualsLiteral("UTF-8")) {
> > +        CopyUTF16toUTF8(user, userpass);
> > +        userpass.Append(':');
> 
> please leave the "// always send a ':' (see bug 129565)" comment
Comment on attachment 8821682 [details] [diff] [review]
patch for digest

I'm not the right reviewer for this.  Honza seems to be out, so over to the module owner.
Attachment #8821682 - Flags: review?(bzbarsky) → review?(jduell.mcbugs)
Comment on attachment 8821683 [details] [diff] [review]
patch for test digest

Er, I guess Patrick is the owner.
Attachment #8821683 - Flags: review?(bzbarsky) → review?(mcmanus)
Attachment #8821682 - Flags: review?(jduell.mcbugs) → review?(mcmanus)
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #160)
> Comment on attachment 8821682 [details] [diff] [review]
> patch for digest
> 
> I'm not the right reviewer for this.  Honza seems to be out, so over to the
> module owner.

I looked here https://www.mozilla.org/en-US/about/governance/policies/reviewers/, maybe it's not right.
Attached patch patch for basic (obsolete) — Splinter Review
Attachment #8822552 - Flags: review?(mcmanus)
Attached patch patch for test basic (obsolete) — Splinter Review
Attachment #8822553 - Flags: review?(mcmanus)
> I looked here https://www.mozilla.org/en-US/about/governance/policies/reviewers/, maybe it's not right.

That's about super-review, which is not the same as module owner review and is pretty much obsolete.

What you want is https://wiki.mozilla.org/Modules/Core#Necko and see the owner and peer list.
honza is really the right reviewer here. matching chrome is the right thing to do in this case.
Attachment #8821682 - Flags: review?(mcmanus)
Attachment #8821683 - Flags: review?(mcmanus)
Attachment #8822552 - Flags: review?(mcmanus)
Attachment #8822553 - Flags: review?(mcmanus)
Matching Chrome might break existing deployments where servers do user agent sniffing to decide which encoding to use. See <https://greenbytes.de/tech/webdav/rfc7617.html#rfc.section.B.3>.
...to r? me on the patches (thanks for them!) and answer all the questions here.
Flags: needinfo?(honzab.moz)
Case-insensitivity? RFC 7617 says:
Note that both scheme and parameter names are matched case-insensitively.
The only allowed value is "UTF-8"; it is to be matched case-insensitively.
Attachment #8821682 - Flags: review?(honzab.moz)
(In reply to guntiso from comment #169)
> Case-insensitivity? RFC 7617 says:
> Note that both scheme and parameter names are matched case-insensitively.
> The only allowed value is "UTF-8"; it is to be matched case-insensitively.

I was waiting for this question will arise.Maybe
Attachment #8821683 - Flags: review?(honzab.moz)
Attachment #8822552 - Flags: review?(honzab.moz)
Attachment #8822552 - Flags: review?(honzab.moz)
Comment on attachment 8822552 [details] [diff] [review]
patch for basic

>diff -r 79ef93672445 netwerk/protocol/http/nsHttpBasicAuth.cpp
>--- a/netwerk/protocol/http/nsHttpBasicAuth.cpp	Thu Dec 29 12:03:47 2016 -0800
>+++ b/netwerk/protocol/http/nsHttpBasicAuth.cpp	Fri Dec 30 04:45:00 2016 +0600
>@@ -5,16 +5,17 @@
> 
> // HttpLog.h should generally be included first
> #include "HttpLog.h"
> 
> #include "nsHttpBasicAuth.h"
> #include "plbase64.h"
> #include "plstr.h"
> #include "nsString.h"
>+#include "mozilla/Tokenizer.h"
> 
> namespace mozilla {
> namespace net {
> 
> //-----------------------------------------------------------------------------
> // nsHttpBasicAuth <public>
> //-----------------------------------------------------------------------------
> 
>@@ -82,22 +83,36 @@ nsHttpBasicAuth::GenerateCredentials(nsI
>     NS_ENSURE_ARG_POINTER(creds);
> 
>     *aFlags = 0;
> 
>     // we only know how to deal with Basic auth for http.
>     bool isBasicAuth = !PL_strncasecmp(challenge, "basic", 5);
>     NS_ENSURE_TRUE(isBasicAuth, NS_ERROR_UNEXPECTED);
> 
>-    // we work with ASCII around here
>     nsAutoCString userpass;
>-    LossyCopyUTF16toASCII(user, userpass);
>-    userpass.Append(':'); // always send a ':' (see bug 129565)
>-    if (password)
>+    uint16_t charset;
>+    nsresult rv = ParseCharset(challenge, &charset);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+    if (charset) {
>+      if (charset & CHARSET_UTF8) {
>+        CopyUTF16toUTF8(user, userpass);
>+        userpass.Append(':');
>+        if (password) {
>+          AppendUTF16toUTF8(password, userpass);
>+        }
>+      } else {
>+        return NS_ERROR_UNEXPECTED;
>+      }
>+    } else {
>+      LossyCopyUTF16toASCII(user, userpass);
>+      userpass.Append(':'); // always send a ':' (see bug 129565)
>+      if (password)
>         LossyAppendUTF16toASCII(password, userpass);
>+    }
> 
>     // plbase64.h provides this worst-case output buffer size calculation.
>     // use calloc, since PL_Base64Encode does not null terminate.
>     *creds = (char *) calloc(6 + ((userpass.Length() + 2)/3)*4 + 1, 1);
>     if (!*creds)
>         return NS_ERROR_OUT_OF_MEMORY;
> 
>     memcpy(*creds, "Basic ", 6);
>@@ -107,10 +122,60 @@ nsHttpBasicAuth::GenerateCredentials(nsI
> 
> NS_IMETHODIMP
> nsHttpBasicAuth::GetAuthFlags(uint32_t *flags)
> {
>     *flags = REQUEST_BASED | REUSABLE_CREDENTIALS | REUSABLE_CHALLENGE;
>     return NS_OK;
> }
> 
>+NS_IMETHODIMP
>+nsHttpBasicAuth::ParseCharset(const char *challenge, uint16_t * charset)
>+{
>+    const char *ch = challenge + 5;
>+    *charset = CHARSET_NOT_SET;
>+    Tokenizer p(ch);
>+    Tokenizer::Token t;
>+    nsAutoCString name;
>+    nsAutoCString value;
>+    while (p.Next(t)) {
>+      p.Rollback();
>+      while (p.CheckChar(',') || p.CheckWhite());
>+      if (!p.Next(t) || t.Type() == Tokenizer::TOKEN_EOF) {
>+        break;
>+      }
>+      p.Rollback();
>+      p.Record();
>+      while (p.Next(t) && !t.Equals(Tokenizer::Token::Char('=')) &&
>+             t.Type() != Tokenizer::TOKEN_WS);
>+      if (p.HasFailed()) {
>+        return NS_ERROR_INVALID_ARG;
>+      }
>+      p.Claim(name);
>+      p.Rollback();
>+      p.SkipWhites();
>+      if (!p.CheckChar('=')) {
>+        return NS_ERROR_INVALID_ARG;
>+      }
>+      p.SkipWhites();
>+      bool quoted = p.CheckChar('"');
>+      if (quoted) {
>+        if (!p.ReadUntil(Tokenizer::Token::Char('"'), value)) {
>+          return NS_ERROR_INVALID_ARG;
>+        }
>+      } else {
>+        p.Record();
>+        while (p.Next(t) && !t.Equals(Tokenizer::Token::Char(',')) &&
>+               t.Type() != Tokenizer::TOKEN_WS);
>+        p.Claim(value);
>+      }
>+      if (name.EqualsLiteral("charset")) {
>+        *charset = CHARSET_SPECIFIED;
>+        if (!PL_strncasecmp(value.get(), "UTF-8", 5)) {
>+          *charset |= CHARSET_UTF8;
>+        }
>+      }
>+    }
>+  return NS_OK;
>+}
>+
> } // namespace net
> } // namespace mozilla
>diff -r 79ef93672445 netwerk/protocol/http/nsHttpBasicAuth.h
>--- a/netwerk/protocol/http/nsHttpBasicAuth.h	Thu Dec 29 12:03:47 2016 -0800
>+++ b/netwerk/protocol/http/nsHttpBasicAuth.h	Fri Dec 30 04:45:00 2016 +0600
>@@ -5,28 +5,33 @@
> 
> #ifndef nsBasicAuth_h__
> #define nsBasicAuth_h__
> 
> #include "nsIHttpAuthenticator.h"
> 
> namespace mozilla { namespace net {
> 
>+#define CHARSET_NOT_SET 0x00
>+#define CHARSET_SPECIFIED 0x01
>+#define CHARSET_UTF8 0x02
>+
> //-----------------------------------------------------------------------------
> // The nsHttpBasicAuth class produces HTTP Basic-auth responses for a username/
> // (optional)password pair, BASE64("user:pass").
> //-----------------------------------------------------------------------------
> 
> class nsHttpBasicAuth : public nsIHttpAuthenticator
> {
> public:
>     NS_DECL_ISUPPORTS
>     NS_DECL_NSIHTTPAUTHENTICATOR
> 
> 	nsHttpBasicAuth();
> private:
> 	virtual ~nsHttpBasicAuth();
>+        NS_IMETHODIMP ParseCharset(const char *challenge, uint16_t * charset);
> };
> 
> } // namespace net
> } // namespace mozilla
> 
> #endif // !nsHttpBasicAuth_h__
Attachment #8822552 - Flags: review?(honzab.moz)
Attached patch patch for basic (obsolete) — Splinter Review
Yes, it's a mess, sorry.
Attachment #8822552 - Attachment is obsolete: true
Attachment #8822552 - Flags: review?(honzab.moz)
Attachment #8824314 - Flags: review?(honzab.moz)
Attachment #8822553 - Flags: review?(honzab.moz)
(In reply to guntiso from comment #169)
> Case-insensitivity?

Fixed
Can I delete messages?
Attached patch patch for basic (obsolete) — Splinter Review
Stop me, please.
Attachment #8824314 - Attachment is obsolete: true
Attachment #8824314 - Flags: review?(honzab.moz)
Attachment #8824354 - Flags: review?(honzab.moz)
Unstoppable ;) Note that parameter names (i.e. "charset") are matched case-insensitively
(In reply to guntiso from comment #176)
> Unstoppable ;) Note that parameter names (i.e. "charset") are matched
> case-insensitively

Ok.
Attached patch patch for basic (obsolete) — Splinter Review
Attachment #8824354 - Attachment is obsolete: true
Attachment #8824354 - Flags: review?(honzab.moz)
Attachment #8824634 - Flags: review?(honzab.moz)
Just a note I didn't forget about this bug.  It's lower priority, tho.
Comment on attachment 8824634 [details] [diff] [review]
patch for basic

Review of attachment 8824634 [details] [diff] [review]:
-----------------------------------------------------------------

thanks, this looks better, let's just make the parser code more clean and readable.

::: netwerk/protocol/http/nsHttpBasicAuth.cpp
@@ +91,3 @@
>      nsAutoCString userpass;
> +    uint16_t charset;
> +    nsresult rv = ParseCharset(challenge, &charset);

nit: please define |nsresult rv| as the first thing in the method.

@@ +91,5 @@
>      nsAutoCString userpass;
> +    uint16_t charset;
> +    nsresult rv = ParseCharset(challenge, &charset);
> +    NS_ENSURE_SUCCESS(rv, rv);
> +    if (charset) {

if (charset & CHARSET_SPECIFIED)

@@ +109,1 @@
>          LossyAppendUTF16toASCII(password, userpass);

please always enclose body (in the code you add or directly touch, as is here):

if () {
}

@@ +129,5 @@
>  
> +NS_IMETHODIMP
> +nsHttpBasicAuth::ParseCharset(const char *challenge, uint16_t * charset)
> +{
> +    const char *ch = challenge + 5;

what is the '+ 5' for?  the tokenizer must skip anything you want to ignore, it's the whole purpose of it to shakeout any need for unsafe pointer arithmetic.

Please expose a BNF or at least an example(s) of how the input can look here.  The parsing loop is not very readable as you've write it and hence it's hard to find the expected format out it and thus confirm your code is correct.

ok, below are few notes how to improve this code since this is not the best way how to use the tokenizer api.  we can simplify/clean up here a bit.

@@ +134,5 @@
> +    *charset = CHARSET_NOT_SET;
> +    Tokenizer p(ch);
> +    Tokenizer::Token t;
> +    nsAutoCString name;
> +    nsAutoCString value;

please use nsDependentCSubstring for name and value here.  It's a bit more optimal.  

Then you can do name.EqualsLiteral("...", nsCaseInsensitiveUTF8StringComparator()); if there is no LowerCaseEqualsLiteral() on that class.

@@ +136,5 @@
> +    Tokenizer::Token t;
> +    nsAutoCString name;
> +    nsAutoCString value;
> +    while (p.Next(t)) {
> +      p.Rollback();

you can do 

while (!p.CheckEOF()) {
  ...
}

@@ +137,5 @@
> +    nsAutoCString name;
> +    nsAutoCString value;
> +    while (p.Next(t)) {
> +      p.Rollback();
> +      while (p.CheckChar(',') || p.CheckWhite());

so, here you are skipping all occurrences of ',' and whitespaces?  so that an input like ",,,,,   ,  ,   \t,  " will be ignored?  Is that really correct?


and please, if you want an empty body loop, write it as:

while (cond) {
}

@@ +139,5 @@
> +    while (p.Next(t)) {
> +      p.Rollback();
> +      while (p.CheckChar(',') || p.CheckWhite());
> +      if (!p.Next(t) || t.Type() == Tokenizer::TOKEN_EOF) {
> +        break;

if (p.CheckEOF()) {
  break;
}

will do..

@@ +141,5 @@
> +      while (p.CheckChar(',') || p.CheckWhite());
> +      if (!p.Next(t) || t.Type() == Tokenizer::TOKEN_EOF) {
> +        break;
> +      }
> +      p.Rollback();

so here you want the last whitespace or ',' be read again?  this code doesn't make much sense.

@@ +143,5 @@
> +        break;
> +      }
> +      p.Rollback();
> +      p.Record();
> +      while (p.Next(t) && !t.Equals(Tokenizer::Token::Char('=')) &&

nit: !t.Equals(Tokenizer::Token::Char('=')) && on it's own line please

@@ +144,5 @@
> +      }
> +      p.Rollback();
> +      p.Record();
> +      while (p.Next(t) && !t.Equals(Tokenizer::Token::Char('=')) &&
> +             t.Type() != Tokenizer::TOKEN_WS);

nit: there is also Token::Whitespace() so that you may use Equals for better code readability

@@ +149,5 @@
> +      if (p.HasFailed()) {
> +        return NS_ERROR_INVALID_ARG;
> +      }
> +      p.Claim(name);
> +      p.Rollback();

why rollback again?

@@ +156,5 @@
> +        return NS_ERROR_INVALID_ARG;
> +      }
> +      p.SkipWhites();
> +      bool quoted = p.CheckChar('"');
> +      if (quoted) {

can you just

if (p.CheckChar('"')) ?

@@ +163,5 @@
> +        }
> +      } else {
> +        p.Record();
> +        while (p.Next(t) && !t.Equals(Tokenizer::Token::Char(',')) &&
> +               t.Type() != Tokenizer::TOKEN_WS);

why is a whitespace considered a delimiter here?

(anyway, thinking of adding ReadUntil(function bool (Token&)) on tokenizer if needed)
Attachment #8824634 - Flags: review?(honzab.moz) → feedback+
Comment on attachment 8821682 [details] [diff] [review]
patch for digest

Review of attachment 8821682 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpDigestAuth.cpp
@@ -68,5 @@
>    nsAutoCString hashString;
>    rv = mVerifier->Finish(false, hashString);
>    if (NS_FAILED(rv)) return rv;
>  
> -  NS_ENSURE_STATE(hashString.Length() == sizeof(mHashBuf));

maybe convert it to <= check and leave?  to make sure we don't overflow.

@@ +344,5 @@
>    nsAutoCString authString;
>  
> +  if (userhash == USERHASH_TRUE) {
> +    char hashuser[EXPANDED_DIGEST_LENGTH+1];
> +    cUser.Append(":");

Append(':') is a bit faster

@@ +416,5 @@
>    }
>  
> +  if (userhash) {
> +    authString.AppendLiteral(", userhash=");
> +      if (userhash == USERHASH_TRUE) {

nit: indention

@@ +504,5 @@
>      else
>        result[(index*2)+1] = value - 10 + 'a';
>    }
>  
> +  result[ExpandedDigestLength(algorithm)] = 0;

please walk the code carefully and check that result is always ExpandedDigestLength(algorithm) + 1 long.
Attachment #8821682 - Flags: review?(honzab.moz) → review+
Comment on attachment 8821682 [details] [diff] [review]
patch for digest

Review of attachment 8821682 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpDigestAuth.cpp
@@ +346,5 @@
> +  if (userhash == USERHASH_TRUE) {
> +    char hashuser[EXPANDED_DIGEST_LENGTH+1];
> +    cUser.Append(":");
> +    cUser.Append(realm);
> +    rv = DoHash(cUser.get(), cUser.Length(), algorithm);

one question: is there a spec for how the username has to be encoded before we give it the hashing function?  I presume utf-8, but rather checking.
Comment on attachment 8822553 [details] [diff] [review]
patch for test basic

Review of attachment 8822553 [details] [diff] [review]:
-----------------------------------------------------------------

thanks!

::: netwerk/test/unit/test_authentication.js
@@ +156,5 @@
>  
>      if (this.flags & FLAG_BOGUS_USER)
>        this.user = "foo\nbar";
> +    if (this.flags & FLAG_UTF8_USER)
> +      this.user = "\u0443\u0442\u04448";

please have a global const for this (you are using it more than once)

@@ +518,5 @@
> +  else
> +  {
> +    // didn't know guest:guest, failure
> +    response.setStatusLine(metadata.httpVersion, 401, "Unauthorized");
> +    response.setHeader("WWW-Authenticate", 'Basic realm="secret", charset=UTF-8', false);

can we add some dummy arguments beside charset to exercise the parser a bit?
Attachment #8822553 - Flags: review?(honzab.moz) → review+
Comment on attachment 8821683 [details] [diff] [review]
patch for test digest

Review of attachment 8821683 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/test/unit/test_authentication.js
@@ +156,5 @@
>  
>      if (this.flags & FLAG_BOGUS_USER)
>        this.user = "foo\nbar";
> +    if (this.flags & FLAG_UTF8_USER)
> +      this.user = "\u0443\u0442\u04448";

again, have a global const for this.

@@ +584,3 @@
>   var authenticate = 'Digest realm="secret", domain="/",  qop=auth,' +
>                      'algorithm=MD5, nonce="' + nonce+ '" opaque="' + 
> +                     opaque + '"' + ', algorithm=SHA-256';

algorithm param is there twice, once for MD5 and then for SHA-256

@@ +2100,5 @@
> +    var username = (auth.match(usernameRE))[1];
> +    var algorithm = (auth.match(algorithmRE))[1];
> +    var userhash = (auth.match(userhashRE))[1];
> +    if (userhash == "true") {
> +      if (username != H("guest:secret", algorithm)) {

might be more interesting to do this for a utf8 encoded user name?
Attachment #8821683 - Flags: review?(honzab.moz) → review+
(In reply to for nothing from comment #159)
> > This is still not a grammar-like parsing loop, sorry.
> 
> Still not sure that I understood correctly. Need something like this?

I was not reading this code, sorry.  But more or less, yes.  If you are lost in how to use the tokenizer, I can suggest how to write the parser loop.  I just need some BNF (lazy to read specs again :))

Thanks and sorry for delays!
Flags: needinfo?(honzab.moz)
The spec for the authentication framework is RFC 7235, with the ABNF for WWW-Authenticate in <https://greenbytes.de/tech/webdav/rfc7235.html#header.www-authenticate>.

The spec for the "Basic" authentication scheme is RFC 7617. I18N is discussed in <https://greenbytes.de/tech/webdav/rfc7617.html#rfc.section.2.1>.

By default, Firefox imho should continue to do what it always did (use ISO-8859-1). The charset param in the challenge allows the server to override this.
Attached patch patch for digestSplinter Review
>check that result is always ExpandedDigestLength(algorithm) + 1 long.
Ok, result always is EXPANDED_DIGEST_LENGTH+1 long,that is 64 + 1.

>how the username has to be encoded before we give it the hashing function?
https://tools.ietf.org/html/rfc7616#section-3.3
>charset
>This is an OPTIONAL parameter that is used by the server to
>indicate the encoding scheme it supports.  The only allowed value
>is "UTF-8".

>userhash
>This is an OPTIONAL parameter that is used by the server to
>indicate that it supports username hashing.  Valid values are:
>"true" or "false".  Default value is "false".

https://tools.ietf.org/html/rfc7616#section-3.4
>username
>The user's name in the specified realm.  The quoted string
>contains the name in plaintext or the hash code in hexadecimal
>notation.  If the username contains characters not allowed inside
>the ABNF quoted-string production, the username* parameter can be
>used.  Sending both username and username* in the same header
>option MUST be treated as an error.

>username*
>If the userhash parameter value is set "false" and the username
>contains characters not allowed inside the ABNF quoted-string
>production, the user's name can be sent with this parameter, using
>the extended notation defined in [RFC5987].

>name in plaintext or the hash code in hexadecimal notation.
Charset is optional, but I don't think the hash must be calculated from the bytes of the utf-16 or from a serialized array with non-zero bytes.
Attachment #8821682 - Attachment is obsolete: true
Attachment #8848140 - Flags: review?(honzab.moz)
Attachment #8821683 - Attachment is obsolete: true
Attachment #8848141 - Flags: review?(honzab.moz)
Attached patch patch for basicSplinter Review
>> +               t.Type() != Tokenizer::TOKEN_WS);
>why is a whitespace considered a delimiter here?

Whitespaces can be.
Attachment #8824634 - Attachment is obsolete: true
Attachment #8848143 - Flags: review?(honzab.moz)
Attachment #8822553 - Attachment is obsolete: true
Attachment #8848144 - Flags: review?(honzab.moz)
I can't apply the patches (rejects in netwerk/test/unit/test_authentication.js.rej) regardless of in which order I apply these four patches.  Maybe it's time to put them back to a single patch when this is so close to a final version.

Note that this has a very low priority (reason I am constantly not getting to review this).

Thanks.
Flags: needinfo?(8f7pfg)
Comment on attachment 8848140 [details] [diff] [review]
patch for digest

dropping r until there is a patch that applies to test locally and on the try server
Attachment #8848140 - Flags: review?(honzab.moz)
Attachment #8848141 - Flags: review?(honzab.moz)
Attachment #8848143 - Flags: review?(honzab.moz)
Attachment #8848144 - Flags: review?(honzab.moz)
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P5
Assignee: 8f7pfg → nobody
Status: ASSIGNED → NEW
Whiteboard: [necko-would-take] → [necko-triaged]
bug 1419658 has made Basic Auth defualt to UTF-8.
I scan the patch here, seems we need to support "userhash" for Digest Auth.
Moreover, Digest seems support UTF-8.
However, if the WWW-Authentication indicates a charset, we need to specify it in the response.
Hence we still need some work here.
Flags: needinfo?(8f7pfg)

Hey Honza, if you are not the right person to ask about this, please remove the needinfo or redirect it to someone that maybe knows better.

Is this issue still relevant today? Based on the Mozilla documentation from (https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication), Firefox once used ISO-8859-1, but changed to utf-8 for parity with other browsers and to avoid potential problems as described in bug 1419658.

And if read things around right, I quote here, UTF-8 "contains virtually all characters in common use,[14] including most Chinese, Japanese and Korean characters." and this could basically cover the non-ISO-8859-1 characters. However, the comment above mine (Comment 195) says there are still things to get worked on here. So should we keep this bug open for the reason mentioned in the previous comment or close it?

Flags: needinfo?(honzab.moz)

Hi Christoph, seems more like a question for you?

Flags: needinfo?(honzab.moz) → needinfo?(ckerschb)

This issue in this bug--inability to use non-ASCII characters--was fixed by bug 1419658. The issues in comment 195 talk about features added to HTTP in 2015, long after this bug was filed, by rfc 7616 and 7617. This bug should not be morphed into an enhancement bug of adding new features from a later version of the spec--unless of course it was germane to solving the stated problem.

I filed bug 1735850 and bug 1735854 to address comment 195. I did not look through the RFCs to see if there are other things we haven't implemented or need to change.

Status: NEW → RESOLVED
Closed: 3 years ago
Depends on: 1419658
Flags: needinfo?(ckerschb)
Resolution: --- → FIXED
See Also: → 1735850, 1735854
You need to log in before you can comment on or make changes to this bug.