User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.2) Gecko/20100115 Firefox/3.6 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.2) Gecko/20100115 Firefox/3.6 Using http digest authentication a browser is supposed to retry the request (see section 3.2.2 of RFC 2617) passing an authorization header line. This works in general. But if a username (no password) with a special character like the german umlaut "ä" (ASCII: 228, HEX: E8) is entered, the browser acts as if the authentication was canceled by the user. Reproducible: Always Steps to Reproduce: 0. Prepare with a http sniffing tool like fiddler 1. Point you firefox to http://digest.reisi.com 2. Enter one special umlaut character into the username textbox 3. Hit Enter Actual Results: Firefox does not retry the request with the supplied username Expected Results: Firefox should retry the request with the authentication line see http://digest.reisi.com/source.html for all the source code behind this microsite. Source code of .htaccess and the index.php is provided. I used Fiddler to check if there were made any requests
Sounds like likely confusion between UTF-8 and ISO-8559-1 somewhere... Honza, do you know anything about this offhand?
When I try the steps in comment 0 I get back a page that says "died" and nothing else. Is that indicating that the bug was observed?
Apparently yes, to answer my question from comment 2. A log in a debug build per http://www.mozilla.org/projects/netlib/http/http-debugging.html shows this: -1606793440[90a4e0]: nsHttpDigestAuth::GenerateCredentials [challenge=Digest realm="Test - test",qop="auth",nonce="4b79fe27f35ff",opaque="293906c87ba9c07bbcc539da83dde298"] -1606793440[90a4e0]: nonce_count=00000001 -1606793440[90a4e0]: cnonce=01b6730aae57c007 -1606793440[90a4e0]: WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /Users/bzbarsky/mozilla/vanilla/mozilla/netwerk/protocol/http/src/nsHttpDigestAuth.cpp, line 358 -1606793440[90a4e0]: nsHttpChannel::OnAuthCancelled [this=1ec456f0] Relevant code: 356 authString.AssignLiteral("Digest username="); 357 rv = AppendQuotedString(cUser, authString); 358 NS_ENSURE_SUCCESS(rv, rv); Looks like AppendQuotedString just can't handle non-ASCII as written. It bails out because non-ASCII chars test true for <= 31 (since they're negative, as signed chars). But if it didn't it would just create an invalid header anyway, per the comment at the end of it.
just added a line to the http://digest.reisi.com/index.php to indicate if the bug was fixed and to clarify the expected result. So: New expected: output of "bug fixed" after the entering of an umlaut (ä) New actual result: output of "died" Please refer to the http://digest.reisi.com/source.html
nsHttpDigestAuth::AppendQuotedString seems to expect ISO-8559-1 as an input, but it gets UTF-8 encoded string and walks each distinct octet of it. It doesn't break because ä value is e8 but because there are two UTF-8 octets, c3 a4, where the first one is tested by that condition that fails. When I look into the patch for bug 41489, it seems it changes loosy convert to regular convert in the basic auth code. Digest code seems to use regular convert already. As I read RFC 2047, we should probably use 'encoded-word' for this that includes character set (UTF-8 in our case), encoding (probably 'q') and the encoded string. I'm just not sure if "An 'encoded-word' may replace a 'text' token" applies also in case of a 'quoted-string' that is by def a 'text' except the '"' char.
Oh, I see. This bug is about digest auth, not basic auth. Gotcha. So 2047-encoding is the only way to stuff these chars into this header, no?
If there is no conversion from UTF-8 to ISO Latin-1 (I expect not) after which we may use actual quotation of the string, then I cannot see other option at the moment, but I'm still googling.
Taking back: An 'encoded-word' MUST NOT appear within a 'quoted-string'. Read more: http://www.faqs.org/rfcs/rfc2047.html#ixzz0fjwHEOYq
My other idea is to follow the solution from bug 41489 comment 84: first, send just lower byte (loosy convert, quoted) then try to send it in utf-8 (quoted). It's not said anywhere how to say what character encoding is used for a quoted-string or how to encode username in the Authorization header. I'm going to check some server implementations how they deal with non-ascii chars in the username.
Alois: how did you actually set the username on the server? bz: for example htdigest for apache configuration takes username directly from command line and counts HA1 directly from bytes taken from a terminal (a native encoding). mod_auth_digest.c takes the de-quoted string as is and counts HA1. I'm not expert to passing non-ascii characters from terminal to arguments, but it seems to me that clients must know encoding used to generate the password on the server, what is obviously impossible.
Actually, I can see two problems here: 1. we have to send the Username= portion in the Authorization header, that is by definition a quoted-string, that is unable to carry any encoding (comment 8) and I haven't found any requirement for the header nor a quoted-string in any rfc. 2. we have to count HA1 hash from byte representation of the user name, for which is not said anywhere in which character set or encoding should be before we pass it to the hash function. Issue 1 also affects basic auth that has to base64 encode the user name's byte representation, but it is not said what char set it has to be in, what is actually cause why bug 41489 is still not fixed.
So I guess the question is whether we're trying to make non-ASCII usernames "work" or whether we're just trying to make sure we don't fall back to showing the 401 body even if the user enters a non-ASCII username... The latter is what comment 0 is about, strictly speaking...
Regarding #c10, we could IMHO assume that native encoding on the apache server is UTF-8. At least this is true for most linux distros (already for several years). So we can send the username as quoted UTF8. FYI this is what Opera does.
Btw. it is also a question how to handle non-ASCII passwords. They aren't sent over the wire, but it is a question in which form they should be hashed. Non-quoted UTF-8?
In fact, i would recommend two steps. First, the violation of RFC 2617 should be corrected. I think it´s better to send "somehow" converted characters back to the server and retry the request as stated in the RFC. In this (of course very bad) case some could implement its own/the correct algorithm on the server side. At least there is a chance to. Now there is no chance at all. BTW, in my implementation i go fully UTF-8, as most sites do nowadays. This is why i think making the switch to use UTF as default (like OPERA) would also be a very good choice. Second i´d recommend some extension to the RFC itself. Like a charset parameter in the WWW-Authenticate Line and in the authorization line. I thought i´ve read something like this in a RFC draft somewhere, but can´t find yet. But i guess this bugzilla entry is the wrong place to discuss this. Maybe some of the fellow readers has the right connection to the correct place...
Alois, good summary, UTF-8 is a good way to go here, also supported by comment 13. bz, do you agree? To have a new error code is not a simple way to go if we want this fix for a branch - new stings.
> BTW, in my implementation i go fully UTF-8 Note that doing that on our end violates RFC 2617... I guess that's where the extension comes in?
(In reply to comment #17) > > BTW, in my implementation i go fully UTF-8 > > Note that doing that on our end violates RFC 2617... Because quoted-string is supposed to be in 8859-1 charset? > > I guess that's where the extension comes in? Use auth-param for this? As I read RFC 2617, it seems we should hash directly the quoted string (with potential =NN in it) but w/o the leading and trailing '"'. That's something apache is not doing... Seems like something is really missing in that RFC. To move to a conclusion: do we want to support ascii chars >127 in user names? Do want to try at least something to work with most server implementations, that seems to differ as well? WONTFIX this?
> Because quoted-string is supposed to be in 8859-1 charset? Yep. > Seems like something is really missing in that RFC. Yes, like provisions for anything other than West European characters. Have we tried contacting the relevant IETF folks about this?
Just for my information: Is anyone taking care of this? (contacting the cool guys at IETF ;-)
I'll do that soon.
Just to keep it on our minds.... any news on that?
(In reply to comment #19) > Have we tried contacting the relevant IETF folks about this? I did on firstname.lastname@example.org. No answer from the admin. Any suggestions about a forum to post to?
No idea. I would have thought the IETF would answer its mail....
I got answer from Henrik Nordstrom, to feel free to propose an extension to both basic and digest auth scheme. I heard there were some already proposed or drafted. I'll look for it or let's propose one our self.
(In reply to comment #24) > No idea. I would have thought the IETF would answer its mail.... The IETF is a loose connection of people. There's no "IETF" entity one sends email to. That being said the feedback from the HTTPbis WG IMHO is clear (and has been for some time); you'll need to extend the authentication schemes.
possible this should be wontfix
https://tools.ietf.org/html/rfc7616#section-4 is supposed to answer this.