Open Bug 557507 (ContentLanguageMulti) Opened 15 years ago Updated 2 years ago

Treat the the presence of a comma inside Content-Language (META or HTTP) identical with how Mozilla treats the empty string inside the Content-Language (META or HTTP)

Categories

(Core :: DOM: Core & HTML, defect, P5)

defect

Tracking

()

People

(Reporter: xn--mlform-iua, Unassigned)

References

()

Details

(Keywords: html5)

User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_5_8; nn-no) AppleWebKit/531.22.7 (KHTML, like Gecko) iCab/4.7 Safari/525.27.1 Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.3a2pre) Gecko/20100217 Minefield/3.7a2pre The HTML5 working group tries to come to agreement about how to undertstand and how to handle the META content-langauge declaration. Jon Sicking has expressed willingness to change Mozilla. And this bug report represent a suggested change, which should be uncontroversial. See reply to Jon: http://lists.w3.org/Archives/Public/public-html/2010Apr/0146 Basically: The idea is to make sure that the last occurring META content-language element becomes always becomes the "end station". To achieve this, HTML5 should define the semantics of the empty string inside META content-language and inside xml:lang="*"/lang="*" as the same. (HTML5 aligns lang="<empty>" with the semantic of xml:lang="<empty>" - both now has the meaning "unknown language".) But in Mozilla browsers, currently, if the META contains the empty string, then Mozilla browsers will visit the preceding META content-language declaration, or the HTTP content-language header. Reproducible: Always Steps to Reproduce: 1. Create a page with a last, empty META content-language declaration as well as another, preceding non-empty content-language declaration. The page should not contain any xml:lang="*" or lang="*" attributes. 2. Define a CSS selector ELEMENT:lang(non-empty){color:red} and another ELEMENT{color:lime} selector. 3. See which CSS selector that "wins" Actual Results: In Mozilla browsers the selector ELEMENT:lang(non-empty){color:red} will win. Expected Results: The selector ELEMENT{color:lime} should have won. (Because other UAs behave like that.) A bug in the HTML5 specification with focus on this Mozilla issue has been filed: http://www.w3.org/Bugs/Public/show_bug.cgi?id=9422
NOTE, Mozilla also treate the presence of a single comma <meta http-equiv="content-language" content="," /> different from other UAs. In most other UA, the comma is seen by the UA as a language tag, and could be used by the following selector: ELEMENT:lang(\,) {color:lime} Whereas in Mozilla browsers - and actually also in Opera! - the single comma is treated more or less like the empty string is treated in the other user agents. For instance, in Opera and Mozilla, the comma it is impossible to select via the above selector. Please let me know if a separate bug report should be filed for this.
Not parser.
Component: HTML: Parser → DOM
QA Contact: parser → general
So all that's going on here is that nsContentSink::ProcessMETATag has this code: 837 aContent->GetAttr(kNameSpaceID_None, nsGkAtoms::content, result); 838 if (!result.IsEmpty()) { 839 ToLowerCase(header); 840 nsCOMPtr<nsIAtom> fieldAtom(do_GetAtom(header)); 841 rv = ProcessHeaderData(fieldAtom, result, aContent); 842 } So all empty values of @content on <meta> are always ignored. I would vastly prefer that this behavior not depend on the exact value of http-equiv, since that's a rat's nest of compat issues. In any case, I see no point in changing anything here until the spec stabilizes. The ',' thing looks like correct behavior to me, since the content-language value is a comma-separated list of languages. See RFC 2616 section 14.12. Good thing to know that Opera gets this right too; feel free to file bugs on any UAs that don't.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: html5
I would like to suggest a WONTFIX for this bug. However, since I do not have the power to change the status of this bug, and because it is simpler than filing a new, I will instead try to refocus this bug, by changing its subject line from (old): "Treat the empty string inside META content-language identical with lang="<emptystring>" (like, basically, all non-gecko UAs do)" to (new): "Treat the the presence of a comma inside Content-Language (META or HTTP) identical with how Mozilla treats the empty string inside the Content-Language (META or HTTP)" Justification and explanation: Firstly, please consult with Fantasai. We had an exhange in the HTMLwg mailinglist today: http://www.w3.org/mid/20100506160500287451.42e2fa29@xn--mlform-iua.no And as far as I can see, we are on the same page. Thus you may find it useful to talk with her to understand my persepective. Secondly: I did not accept Ian's fallback language algorithm when I filed this bug. But Ian (and others I discussed with) put me in a good school so ... ;-) If you accept to treat the empty string like I proposed when I first filed this bug, then I think you will only drive the HTML working group further away from reaching consensus. Also, please ignore what I said about Opera - I probably misinterpreted Opera quite thoroughly - I don't think it has any support for Content-Language at all. Thirdly: The current version of my change proposal, which Fantasai offered input on today (see above), fully accepts Ian's fallback language algorithm. Which means that Gecko behaves correctly with regard to the empty string. (My change proposal is only about what the legal syntax - for authors - should be, not about UA behaviour.) Fourth: There is only one bug in Gecko w.r.t. Content-Language, compared with the fallback language algorithm in HTML5: Multiple language tags (or, if you wish, the presence of a comma) should be treated like Gecko currently treats the empty string: it should lead Geck to not set the fallback language to anything and, if it was the META that was emtpy, then it should lead Gecko to go visit the HTTP header. Hence I suggest to either set this bug to WONTFIX or to refocus is to be a bug about changing how Mozilla treats multiple language tags inside Content-Language. (There is also another bug which *relates* to Content-Language, namely that the empty string of lang="" is not respected whenever Content-Language is set. But that is - at least conceptually - a bug in how @lang is implemented. I have yet to file a bug about this.)
Alias: ContentLanguageMulti
Summary: Treat the empty string inside META content-language identical with lang="<emptystring>" (like, basically, all non-gecko UAs do) → Treat the the presence of a comma inside Content-Language (META or HTTP) identical with how Mozilla treats the empty string inside the Content-Language (META or HTTP)
<delete> and, if it was the META that was emtpy, </delete> <insert> and, if it was the META that had multiple language tags, </insert>
(In reply to comment #4) > (There is also another bug which *relates* to Content-Language, namely that the > empty string of lang="" is not respected whenever Content-Language is set. But > that is - at least conceptually - a bug in how @lang is implemented. I have yet > to file a bug about this.) Now filed a bug about this: bug 564290
> Multiple language tags (or, if you wish, the presence of a comma) should be > treated like Gecko currently treats the empty string: it should lead Gecko to > not set the fallback language to anything Why? The HTTP RFC is _very_ clear on the multiple language tags thing, as far as I can tell. Note that "refocusing" bugs as you called it (or "mutating" as it's usually called) is generally a bad idea.
It's very clear that multiple language tags are allowed, but Content-Language represents the intended audience language, not the content language. If there's only one audience language, you can reasonably guess that, barring information to the contrary, the primary content language is the same as the audience language. That is reasonable fallback behavior. But if there are multiple audience languages, it's not as straightforward to guess the primary content language from the Content-Language header, so what Leif is saying is that we should not assume any fallback in such cases.
(In reply to comment #7) > > Multiple language tags (or, if you wish, the presence of a comma) should be > > treated like Gecko currently treats the empty string: it should lead Gecko to > > not set the fallback language to anything > > Why? The HTTP RFC is _very_ clear on the multiple language tags thing, as far > as I can tell. I am only trying to express what HTML5 has defined the effect on parsers to be. Also, logically, there is only two logical ways to interpret multiple language tags inside Content-Language, when we consider its fallback language effect: EITHER all languages are treated as equal fallback language candidates. OR none of them are treated as fallback language candidates. Gecko currently uses the first option. There are some in the HTMLwg that think the Gecko behaviour is superior. My change proposal [1] says that validators should show a warning each time a fallback language effect [aka Content-Language] kicks in. With the way HTML5 defines the fallback effect of Content-Language, then it only kicks in when the Content-Langauge contains a single tag (and only if the root element doesn't have a lang attribute.). If the Gecko behavior was standardized, then it would have to display a warning also when Content-Language contained multiple language tags. For those that are against Content-Language inside the META element ... then perhaps the many more warnings that this would cause, could encourage them to support the Gecko behaviour. ;-) Ian's argument against the Gecko behaviour is that it, whenever it kicks in, causes an element to be in multiple languages simultaneously. I could live with the Gecko behaviour, but it causes some ilogical effects. E.g. a "Content-Language: foo, bar" would make a CSS selectors like this work - without a single lang attribute in the document: *:lang(foo) > *:lang(bar){color:red} Therefore I have thrown myself behind the HTML5 specced behaviour. There is shortage of arguments for supporting Gecko's behaviour ... Currently it is only useful as a way to target Mozilla with a selector that only works in Mozilla ... [1] http://www.w3.org/html/wg/wiki/ChangeProposals/ContentLanguages
(In reply to comment #8) > so what Leif is saying is that we should not assume any fallback in such cases. Yes. Except that I only try to express what HTML5 says. It is a little bit sad for me to say that Gecko should change, because it has always had this behaviour which in many ways was superior and which also can be claimed to be standards based ... At any rate, even if Gecko implements this, you will still have a leading edge, since you support both the HTTP header and the META element, which otherwise only IE8 in standards mode does.
> so what Leif is saying is that we should not assume any fallback in such cases OK. Instead of "refocusing" bugs, can we just get a clear bug filed on that?
https://bugzilla.mozilla.org/show_bug.cgi?id=1472046 Move all DOM bugs that haven’t been updated in more than 3 years and has no one currently assigned to P5. If you have questions, please contact :mdaly.
Priority: -- → P5
Component: DOM → DOM: Core & HTML
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.