Closed Bug 98929 Opened 21 years ago Closed 20 years ago

Implementation of Content-Language in HTTP

Categories

(Core :: Networking: HTTP, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: drepper, Assigned: neeti)

Details

Attachments

(7 files)

Please excuse me using the "Browser-General" component.  I don't know better. 
Please reassign if necessary.

I'll shortly attach a patch to implement the handling of the Content-Language
header of the HTTP protocol at document level.  It's actually a bit more.  If
the HTTP data doesn't contain the info of if it is no HTTP data, the preference
data is used.

With this (tested!) patch I can extend my implementation of the :lang() pseudo
class to handle all cases correctly (see bug
http://bugzilla.mozilla.org/show_bug.cgi?id=35768).

To see the patch working you can configure Apache to use the MultiView option
for a directory and add files like index.html.de.  With an appropriate setting
the mod_negotiate module will pick that file if "de" (German) is the preferred
language.  The transferred data looks like this:

HTTP/1.1 200 OK
Date: Sun, 09 Sep 2001 06:20:36 GMT
Server: Apache/1.3.20 (Unix)  (Red-Hat/Linux) mod_ssl/2.8.4 OpenSSL/0.9.6b
DAV/1.0.2 PHP/4.0.6 mod_perl/1.24_01 mod_throttle/3.1.2
Content-Location: index.html.de
Vary: negotiate,accept-language
TCN: choice
Last-Modified: Sun, 09 Sep 2001 06:14:17 GMT
ETag: "a2c5-147-3b9b08b9;3b9b08b9"
Accept-Ranges: bytes
Content-Length: 327
Connection: close
Content-Type: text/html
Content-Language: de

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
 <HEAD>
  <TITLE>Myware home</TITLE>
  <STYLE type="text/css">
q:lang(de) { quotes: "»" "«" }
  </STYLE>
 </HEAD>
<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
 <BODY BGCOLOR="#FFFFFF">
   <q>Nichts</q> zu sehen.
 </BODY>
</HTML>

This document is displayed with the quoting from the style sheet although no
lang attribute is present in the HTML.
Keywords: patch, review
-> http, I guess
Assignee: asa → neeti
Component: Browser-General → Networking: HTTP
QA Contact: doronr → tever
drepper:

  nsIHttpChannel::GetLanguage

could be replaced with a call to
  
  nsIHttpChannel::GetResponseHeader

i don't see any reason to introduce storage for mContentLanguage in
nsHttpResponseHead.
> i don't see any reason to introduce storage for mContentLanguage in
> nsHttpResponseHead.

I was just trying to be consistent.  This is what is done for mContentType and
mContentLanguage.  Since response headers are not too frequently allocated the
additional memory is IMO a price well payed to get some consistency in the code.

If you disagree I can change it but maybe you want to reconsider.
content-type and content-length are stored for reasons internal to the http
implementation.  the content-language header, however, is not needed interally
by the http protocol implementation.  therefore, it does not need extra storage.
 also, IMO GetLanguage should not be added.  content-language should be treated
similarly to content-disposition, which does not have a special accessor.
I the patch in attachement 48928 more to your liking?  I've tested it and it
still works for me.
your patch looks fine for the most part, but you'll need to get someone in
content to give you an r=, then i (or someone else) can give you an sr=.
Maybe you should modify nsDocument rather than nsHTMLDocument?  (Then you'd end
up hitting nsHTMLDocument and nsXMLDocument.  Or is there a reason you don't
want to modify nsXMLDocument?)

I'm not a big fan of |nsString*| members.  Why not just |nsString|?  (Well, one
answer is that |nsString| isn't as lightweight as we'd like, but...)

Also, the "intl.accept_languages" pref can contain multiple languages (mine is
"en, es, eo").  Is that appropriate here, or do you want to take only the part
before the first comma?  (And is that even an appropriate guess?)
> Or is there a reason you don't want to modify nsXMLDocument?)

The patch changes nsXMLDocument as well.  I'm just concerned about all other
uses of nsDocument.  If you say it's good to move it there I'll do it.  But I
don't want to do it right away.

> I'm not a big fan of |nsString*| members.  Why not just |nsString|?  (Well, one
> answer is that |nsString| isn't as lightweight as we'd like, but...)

I've used the same method elsewhere in these class.


> Also, the "intl.accept_languages" pref can contain multiple languages (mine is
> "en, es, eo").  Is that appropriate here, or do you want to take only the part
> before the first comma?  (And is that even an appropriate guess?)


The Content-Lanaguage value in HTTP also can contain multiple languages (stupid
but true).  So, storing the list is OK.
I've incorporated the comments of dbaron and jst.  The code compiles and works.

I saw some funny effects when loading some HTML docs. But I suspect that in my
tree due to missing dependencies not all files got recompiled after the change.
 I'm rebuilding the whole tree which takes a while on my machine. But the patch
is already attached to this bug (attachment 48965 [details] [diff] [review]).
Comment on attachment 48965 [details] [diff] [review]
move main changes to nsDocument

sr=jst
Attachment #48965 - Flags: superreview+
why does mContentLanguage need to be stored as unicode?  isn't it's value
guaranteed to be ascii?
> isn't it's value guaranteed to be ascii?


Yes, it is.  I can change this as well but it's getting more and more different
from the rest of the code.  If this should serve as a good example I'm fine with
it but changing it only "because it's nice" isn't worth it IMO.  More code will
be needed since then the resources for the string have to be handled explicitly.
so, if mContentLanguage never needs to be converted to a unicode string, then
there is no reason to store it as a unicode string.  if it does, however, need
to be accessed as a unicode string, then perhaps storing it as unicode makes sense.
> if it does, however, need to be accessed as a unicode string, then perhaps
> storing it as unicode makes sense.

First thing I do in the code where I use GetContentLanguage() is to convert it
to ASCII.  So, this would mean using const char* is correct.  But I don't know
what other uses will follow in future.
Next try.  darin insisted on using char data.  This is indeed what I expect in
the :lang() implementation so I updated the patch.  Still works for me.
For completeness, I've attached the patch to use nsCString instead of char.

jst currently leans towards using the original patch with nsString since the
strings this string will be compared with is also in nsString format and so we
avoid converting again and again.
Darin, storing this string as unicode makes sense since the code that will be
using this string (the style system) is all unicode based, if we'd store this as
an 8-bit string we'll end up having to convert the string for every access from
the style system, which can be a fair amount of accesses while laying out a
page. The overhead of storing this string as unicode is lost in the noice
anyway, so storing it as unicode is IMO the better of the two alternatives. I'd
say we go with attachment 48982 [details] [diff] [review].

Thanks Ulrich for fixing this!
Comment on attachment 48982 [details] [diff] [review]
little bug fix + optimizations

> 
>+
> nsDOMStyleSheetList::nsDOMStyleSheetList(nsIDocument *aDocument)
> {
>   NS_INIT_REFCNT();

Unnecessary extra line ;). r=peterv.
Attachment #48982 - Flags: review+
sounds good to me... thanks for walking through the content-language usage.
r/sr=darin
It was checked in by timeless@mac.com on September 11th.

-> Fixed.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.