Closed Bug 308187 (haaretz) Opened 19 years ago Closed 13 years ago

haaretz.co.il - wrong charset in HTTP headers (csISOLatinHebrew instead of Windows-1255)

Categories

(Tech Evangelism Graveyard :: Hebrew, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: reuven, Unassigned)

References

()

Details

(Whiteboard: [contacted] - see comment #3)

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8b4) Gecko/20050908 Firefox/1.4
Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8b4) Gecko/20050908 Firefox/1.4

Firefox 1.0.x displayed Hebrew Web pages quite nicely, taking into account the
fact that numbers are displayed from left to right.  The 1.5 beta build, on the
Mac at least, seems to be a regression.  Numbers are normally displayed just
fine, but when they are attached to a word (as is often the case with a hyphen),
the numbers are reversed.  It might also occur sometimes without a hyphen. 
Moreover, the number is placed on the wrong part of the line, out of its correct
place within the sentence.

Reproducible: Always

Steps to Reproduce:
1. Go to
http://www.haaretz.co.il/hasite/pages/ShArtPE.jhtml?contrassID=2&subContrassID=3&sbSubContrassID=0&itemNo=623784
2. Look at the number in the headline, which should be 1967.
3.

Actual Results:  
I filed this bug report.

Expected Results:  
It should have shown the accurate year in the right place.
Seeing this on Windows also
Assignee: nobody → mozilla
Component: General → Layout: BiDi Hebrew & Arabic
OS: MacOS X → All
Product: Firefox → Core
QA Contact: general → zach
Hardware: Macintosh → All
Version: unspecified → Trunk
There is an incorrect charset in the HTTP header:

Content-Type: text/html; charset="csISOLatinHebrew"

and the page encoding is set to Hebrew Visual. Resetting manually to Hebrew
(Windows-1255) makes the ordering OK again.

This is really an evangelism bug, since csISOLatinHebrew is registered as an
alias of ISO-8859-8 in the IANA registry.
Component: Layout: BiDi Hebrew & Arabic → Hebrew
Priority: -- → P1
Product: Core → Tech Evangelism
Summary: Numerals are displayed from right to left when part of a word containing Hebrew letters → haaretz.co.il - wrong charset in HTTP headers (csISOLatinHebrew instead of Windows-1255)
Version: Trunk → unspecified
Assignee: mozilla → hebrew
Status: UNCONFIRMED → NEW
Ever confirmed: true
QA Contact: zach → hebrew
I sent an e-mail with the details of the problem to helpdesk@haaretz.co.il (I
was provided with this address by phone, after e-mails sent to the published
address, online@haaretz.co.il, bounced).
*** Bug 308447 has been marked as a duplicate of this bug. ***
*** Bug 308532 has been marked as a duplicate of this bug. ***
Whiteboard: [contacted] - see comment #3
*** Bug 311658 has been marked as a duplicate of this bug. ***
(In reply to comment #2)
> There is an incorrect charset in the HTTP header:
> 
> Content-Type: text/html; charset="csISOLatinHebrew"

Today, I see this this in the header of one article:
HTTP-EQUIV="Content-Type" content="text/html; charset=windows-1255"
Still the browser uses iso8859-8.
(In reply to comment #7)
> Today, I see this this in the header of one article:
> HTTP-EQUIV="Content-Type" content="text/html; charset=windows-1255"
> Still the browser uses iso8859-8.

The problem isn't (and never was) with the HTTP-EQUIV meta tags in the HTML. The
problem is with the HTTP headers themselves (which are not part of the HTML
source, so you can't see them there).
One way to see the HTTP headers (where the problem is), is using the "Live HTTP
Headers" extension (http://livehttpheaders.mozdev.org/).
*** Bug 314715 has been marked as a duplicate of this bug. ***
For the record, this is the e-mail I sent to helpdesk@haaretz.co.il on 2005-09-14.
I sent a reminder to online@haaretz.co.il on 2005-09-22, and a second reminder (to helpdesk@haaretz.co.il, online bounced again) today.
I have not received any response so far.
Uri, Try sending email to reporter from Captain Internet (Shachar Samocha maybe?), and ask them if they are aware of this issue.
*** Bug 315864 has been marked as a duplicate of this bug. ***
Hi

I don't understand why you're not addressing the problem in FFox.

I understand that the headers say encoding X and the page says encoding Y. Why not just override the page's encoding with the one written in the HTML code? To me this looks like the right behaviour.

Consider the option where this is not an error: The author of the page really wants the browser to view the page in an encoding different than the one the server stipulates in the headers. Why not comply?
(I realize that in the case of haaretz site this IS probably an error).

I think the correct behaviour would be to use the header encoding if none was specified in the page (by HTML code), but to always allow the encoding specified in the HTML code to override.

Dan
(In reply to comment #13)
> I understand that the headers say encoding X and the page says encoding Y. Why
> not just override the page's encoding with the one written in the HTML code? To
> me this looks like the right behaviour.

It might look right to you, but it would be a clear violation of the W3C standards. See e.g. http://www.w3.org/TR/i18n-html-tech-char/#IDARVFO :
"According to the HTML specification, in a case of conflict the HTTP charset declaration has the highest priority of all means of declaring the character set."

Also, doing what you are suggesting will break compatability with all other browsers (which do follow the standard), and might cause many sites to appear broken.
The actual standard says (http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2):
-----
To sum up, conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest):

   1. An HTTP "charset" parameter in a "Content-Type" field.
   2. A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".
   3. The charset attribute set on an element that designates an external resource.
-----
(In reply to comment #14)
> 
> Also, doing what you are suggesting will break compatability with all other
> browsers (which do follow the standard), and might cause many sites to appear
> broken.
> 
What? MSIE displays Haaretz pages in the correct encoding, meaning that it doesn't follow the standard. 
(In reply to comment #16)
> What? MSIE displays Haaretz pages in the correct encoding, meaning that it
> doesn't follow the standard. 

MSIE displays Haaretz pages in the correct encoding for the same reason Firefox 1.0 did: it does not support quotes around encoding names in HTTP headers (bug 244964). So it does violate a standard, but not this one.

If you want to achieve bug-compatability with IE, you'll have to suggest un-fixing bug 244964. But that too, according to that bug, will break some sites.
(In reply to comment #16)
> What? MSIE displays Haaretz pages in the correct encoding, meaning that it
> doesn't follow the standard. 


MSIE isn't the only browser out there. Haaretz is broken in Safari for a long time already. and when Mac users complained, they where told "but it works fine in Firefox". So be careful when making Firefox bug compatible to MSIE.
*** Bug 317089 has been marked as a duplicate of this bug. ***
Alias: haaretz
(In reply to comment #17)
> (In reply to comment #16)
> > What? MSIE displays Haaretz pages in the correct encoding, meaning that it
> > doesn't follow the standard. 
> 
> MSIE displays Haaretz pages in the correct encoding for the same reason Firefox
> 1.0 did: it does not support quotes around encoding names in HTTP headers (bug
> 244964). So it does violate a standard, but not this one.

Actually, MSIE isn't compliant to the W3C spec. and gives a higher priority to the META declaration over HTTP, which is why so many web sites (where MSIE is dominant) are broken in this respect. Anyway, has anybody contacted the webmaster of the site?

 
(In reply to comment #20)
> 
> Actually, MSIE isn't compliant to the W3C spec. and gives a higher priority to
> the META declaration over HTTP, which is why so many web sites (where MSIE is
> dominant) are broken in this respect. 

Hmmm, so I guess I was misinformed. I apologize.

> Anyway, has anybody contacted the webmaster of the site?

I tried to - see comment #10. After getting no response, I sent today an e-mail to the editor of the online edition (peterh@haaretz.co.il), in which I complained about the unresponsiveness of the technical team. I'll post an update if I get any response.

I've sent mail to help@haaretz.co.il earlier today. This is "captain internet's" support Q&As address (supposed to be). I'll post anything I get back.
I got a response from Oded, the editor of Captain Internet section as follows:

Ron hi,

We are aware of the problem

Unfortunately our computer people have yet to figure out how to mend it without damaging the rest of the site

In the meantime, a good Samaritan (hell, a great one) who goes by the name of Effie Nadiv wrote a patch for the problem

http://www.effie.co.il/zvuvu/mozilla/haaretz.html

we hope it will be fixed soon and I apologize for the inconvenience 

oded

From: Ron ...
Sent: Thursday, December 08, 2005 10:36 PM
To: eyal.saloniki@t...; eyal.saloniki@t...; odedy@h...
Subject: You have a bug

Hello,

I am an avid reader of your online edition and would like to point out a problem in your site.

Your website's HTTP server is using the wrong HTTP header encoding, which shows incorrectly on standard compliant browsers, such as Firefox 1.5. See

https://bugzilla.mozilla.org/show_bug.cgi?id=308187

For more details.

Please fix this issue.

Thanks,
Ron.
*** Bug 318102 has been marked as a duplicate of this bug. ***
*** Bug 319781 has been marked as a duplicate of this bug. ***
*** Bug 320269 has been marked as a duplicate of this bug. ***
I've also e-mailed haaretz, to no effect. I guess devoting 5 minutes to basic compliance with web standards is too much of a hassle for them. On the other hand they might just be employing overworked underpayed  non-unionized monkeys (like their reporters and columnists).
Apparently this has been hashed over in their forums.
What they say there that the webmasters are aware of the issue, but need to plan ahead for such a complicated change... Who knows what may happen if their server headers match their page headers...
They promise it's on their TODO list, but not in the near future...
Keep nagging them :-)
*** Bug 342747 has been marked as a duplicate of this bug. ***
For the record, last week I was in contact with one of Haaretz content people, and he promised me to pass the request further, but we are still far from seeing this issue resolved yet.

Haaretz appear to have fixed this on their end, I think this issue can be closed.
(In reply to comment #33)
> Haaretz appear to have fixed this on their end, I think this issue can be
> closed.

Hi Jermy! Nice to see you around!

Well, I'm not sure if they actually fixed it on every page. For example, I found that the comments popup is still reversed on captain.co.il (and probably on other sites as well, but on captain.co.il more people mix between Hebrew and English). 

Here is a fresh example - http://themarker.captain.co.il/captain/objects/ResponseDetails.jhtml?resNo=4877847&itemno=1089232&cont=2
Hi Tomer,
Do you have any other examples of places this bug is still showing up apart from the comment pages?
OK, captain.co.il appears to be fixed now too.

Do you have any further examples?
I've asked on Twitter, than on my blog and some other places. So far no one is able to reproduce the problem; It is now safe to close this bug. :)

My blog post about this issue (Hebrew; I can translate the important information in case you would like me to do it) - 
http://tomercohen.com/2009/06/06/%d7%97%d7%93%d7%a9%d7%95%d7%aa-%d7%aa%d7%90%d7%99%d7%9e%d7%95%d7%aa-%d7%90%d7%aa%d7%a8%d7%99%d7%9d/


Post by Effie Nadiv, the author of the extension to workaround this issue - http://www.effie.co.il/?p=11 (English)



Thanks goes to everyone involved in solving this issue!
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
I still see this on http://www.haaretz.com/captain/pages/indexCaptain.jhtml
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #38)
> I still see this on http://www.haaretz.com/captain/pages/indexCaptain.jhtml

Indeed, but I am wondering from where you linked to this page. captain.co.il is redirecting to http://themarker.captain.co.il/ which is good to me. (XP and not PX)
(In reply to comment #39)
> Indeed, but I am wondering from where you linked to this page.

I got it as the first Google result for "קפטן אינטרנט"
I did the same search and found the following article with the same problem. (note it is on the main Haaretz domain!)

http://www.haaretz.com/captain/pages/LiArtCaptain.jhtml?contrassID=11&subContrassID=1
Looks like it's fixed
(In reply to comment #42)
> Looks like it's fixed

I found yet another domain with the same issue still occur. Do you have a contact person you can ask them to re-configure this server as well?

https://secure.haaretz.co.il/hasite/pages/tags/index.jhtml?tag=%E3%F4%E3%F4%F0%E9%ED
The secure.haaretz.co.il domain still suffering from this issue...
I thought they fixed this issue, but lately (~6/2011) there has been a regression.
It still affects all the pages created by "HaHeadlines.jhtml", "HaSec.jhtml" & "ShArt.jhtml", but not "spages/*.html". Seems to be a problem whenever a "*.jhtml" is called. Examples:
http://www.haaretz.co.il/hasite/pages/HaHeadlines.jhtml?pageNumber=2&source=Lynx (Page 1 was .html)
http://www.haaretz.co.il/hasite/pages/HaSec.jhtml?pageNumber=2&contrassID=1&subContrassID=3 (Also pages 2 and above)
http://www.haaretz.co.il/hasite/pages/ShArt.jhtml?itemNo=1239199&contrassID=1&subContrassID=3&sbSubContrassID=0 (sometimes they link to articles using "ShArt", and not the usual "/spages/")
I have emailed "customer-htz@haaretz.co.il" & "online@haaretz.co.il", but received no reply...
Has anyone ever got them to acknowledge that they see this as a site bug?
Very annoying!
Ok,
Haaretz just launched a completely new site (not even a re-vamp, it's all completely new, and quite descent-looking). All pages are now encoded UTF-8, and old links are redirected to the new site.
So, less then 24 hours after I joined the collective moaning about their lack of interest and non-responsiveness, I think this bug can (finally) be closed.
Phew!
(In reply to Asher Levy from comment #47)
> So, less then 24 hours after I joined the collective moaning about their
> lack of interest and non-responsiveness, I think this bug can (finally) be
> closed.
> Phew!

Thanks for the information. I've checked previous reports, and all seems to work well without encoding problems. Closing now, we'll have to reopen in case the issue will re-appear. 


(secure.haaretz.co.il has some minor reversed test issues, but since they redirect almost everything from there back to www.haaretz.co.il I don't count it)
Status: REOPENED → RESOLVED
Closed: 15 years ago13 years ago
Resolution: --- → FIXED
Product: Tech Evangelism → Tech Evangelism Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: