Closed Bug 121174 Opened 23 years ago Closed 4 years ago

dashes in some web pages replaced with question marks

Categories

(Core :: Internationalization, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED INVALID
mozilla1.2alpha

People

(Reporter: jonabbey, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

Attachments

(5 files)

In build 2002012108 and over the last couple of weeks on Linux, I'm seeing dashes on the New York Times web pages replaced with question marks. I have seen this on some other web sites, but when I tried to create a simple test page (http://www.burrow.org/~jonabbey/mozilla_bug.html), it didn't show the bug. The URL at the New York Times will require free registration to access, but username 'cypherpunk', password 'cypherpunk', may allow you to access the above URL if the NYT hasn't caught onto this convention. ;-) -- Jon
font bugs to intl.
Assignee: asa → yokoyama
Component: Browser-General → Internationalization
QA Contact: doronr → ruixu
ruixu: can you verify this? cc'ing shanjian
Reporter: Could you please let us know what is your Linux environment? e.g. Linux version/language, your working locale, etc., and could you please attach a screen shot for the problematic page? Thank you.
I can do, but I'm not sure that I should. That is, if I look at the New York Times' page with 'View Source', I see question marks in the raw html file. Given that, I wouldn't expect this to be a font issue. Changing fonts didn't help the situation. In addition, I have seen this both on my home system (Linux 2.2.18 on Athlon with NVidia GeForce2MX card running XFree86 4.1 with NVidia's drivers) and on my work system (stock RedHat 7.1). My home system has the LANG environment variable set to C, my work system has the LANG environment variable set to en_US. I'm reporting a problem with Mozilla on the basis that the NYT pages have question marks in them where Netscape 4.74 on this Linux box shows dashes. I have two guesses as to the etiology of this problem, given that I see the question marks in the view source window as well as in the rendered page. The first would be that something in Mozilla's networking layer is remapping the dash characters for some reason. The second would be that the New York Times' software is munging the dashes for some reason due to the Agent string that it is getting.. i.e., that the NYT is transmitting question marks instead of dashes because it doesn't know what to make of Mozilla's Agent string. The second answer would seem more likely, except that I have seen the question marks somewhere else as well. Unless the NYT is using some standard software which is corrupting its data, the other page would argue against putting the blame on the NYT. If you can provide me with directions on how to change Mozilla's agent string, I can try having Mozilla simulate Netscape 4.74's agent string and hopefully either rule out or rule in the problem being on the New York Time's site. I was hoping someone else could check out the URL I cited and check to see if they were seeing the question marks or not. I will post the screenshot anon.
This screenshot shows the New York Times URL that I included in the initial bug report on my home Linux system running build 2002012208. The question mark at the very beginning of the text shows up as a dash when the same URL is viewed through Netscape 4.74.
Okay, here's another page that has question marks where I don't expect them. http://firingsquad.gamers.com/features/spineless/. See the first paragraph after 'Case in point: Licensing Restrictions'. Where we see EULA?s in place of EULA's. Now, I know that Netscape and Mozilla both have always declined to honor the "smart quote" characters used in web pages authored with MS tools, and I assume that this instance of question mark interpolation is a result of invalid HTML authored by MS FrontPage or MS Word. I point this out because I believe this sort of thing is what made me think that I had seen another page with erroneous question mark interpolation. I do note that the firingsquad page's inappropriate question marks do appear when I do a view source on the page, much as with the NYT's pages.
Since the those kind of news web pages change so fast, I'm saving the page which has problem.
When browsing this live web, I can see the "?" characters. Very intresting, if I save this page locally and open it through Navigator, then I won't see the "?".
the page will be displayed fine in this case. Btw, I'm using linux RedHat7.1-JA, locale=ja_JP.eucjp Reporter: Can you try the attached html file and see if you still can see the "?", thanks!
I realized the original: http://firingsquad.gamers.com/features/spineless/ is a ".asp" file so I can see the "?" in that page but when I saved it as a ".html" format file, then lost some original data so that can not see the "?". But I still can not reproduce the dashes are showing "?" in new york times page: http://www.nytimes.com/2002/01/22/science/space/22BLAC.html The dashes are displayed fine here.
Confirmed as New. There is a word "high-energy" in the attachment 66087 [details], please note that this dash mark cannot be changed to the question mark.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: intl
QA Contact: ruixu → ylong
Is the NYT page perhaps using some kind of non-standard 'm-dash'? I know that TeX, etc., have special characters for dashes of varying lengths. And, yes, when I view the NYT page today, using Linux build 2002012308 on my work system, I do see a couple of simple dash characters instead of the single ? mark that I saw before. From my home system, however, using the same Linux build displaying on my work system's X display (slooowly), I do see the ? mark from the NYT URL given, even after I clear my disk and memory caches and force a reload of the page. I'm going to file this, then try changing my LANG variable to C and try to visit the page and see what I get.
Okay, setting my LANG environment variable to C to match my home system's setting didn't make a difference. I do notice, however, that the '--' I see on my work system on that NYT page is in actuality a single character which is rendered as a double-dash. Try selecting half of the double-dash.. you can't. On both systems, the font in use is adobe-times-iso8859-1. As I say, I tried having my home system display the page on my work X display and still got the question mark. Perhaps it is a matter of my profile. I'll move it aside and try again.
Okay, I tried visiting the NYT page from home with a new profile, using my work display for the X server. I saw the double-dash properly on the NYT page. I will have to investigate more when I get home to see if there is some font on my work display that my home system doesn't have, or if the newly created profile will properly show the double-dash on my home system. Perhaps this has something to do with Unicode fonting, I don't know. I do notice that if I see the double-dash properly rendered in the page, I will also see the double-dash properly rendered in the 'view source' window.
linux display issue, give to shanjian
Assignee: yokoyama → shanjian
accept.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.2
I had the same issue on WinXP with every build I have used .99 - 1.2a. I fixed it today by changing the default character coding to 'western (windows-1252). Don't know if that is helpful in diagnosing the source of the problem.
I can't believe this is considered a bug. Given a specified encoding, only those characters that are part of that encoding should be rendered. The rest *should* be rendered as question marks or something. The given page has no encoding specified, which means whatever default coding the user has specified will be used. This is usually ISO-8859-1, which doesn't contain the characters em-dash, en-dash, double quotation mark and so on. Those characters are part of the General Punctuation block in Unicode, and if one wants to use them in a page with ISO-8859-1 encoding, one *has* to escape them using entities (&emdash;) or Unicode numerals (’). This is an envangelism issue, IMNSHO. If NYT has so lousy web designers, it is their problem, not Mozilla's. They need to specify an encoding on their pages, and they need to escape special characters unless they are prepared to go native Unicode. From such big and wealthy organizations as NYT, one can and should expect better. Fire those "web designers" and hire some fresh talents with a rigid web design foundation. Mozilla, on the other hands, need to concentrate on *real* bugs and stop working on exceptions and other quirks mode gibberish that only serve to bloat and slow down the product. Damn!
So why not just make the default western encoding for Mozilla ISO 8859-15? Wouldn't that take care of the problem? It seems to have for me.
I'm running Mozilla 1.3.1 on RH 8 on a Dell Inspirion (single boot - there's nothing Microsoft on this machine) and I get this problem at several sites - NYT, as noted, as well as Wall Street Journal and many others.
*** Bug 209624 has been marked as a duplicate of this bug. ***
*** Bug 227325 has been marked as a duplicate of this bug. ***
re comment #15: I wrote a few times to NYT to fix this issue, but they never replied. Currently, they speciy 'iso-8859-1' in their first page (http://www.nytimes.com), but they don't specify it elsewhere. For instance, the following page is in Windows-1252, but neither ISO-8859-1 nor Windows-1252 is specified (http header and meta tag). http://www.nytimes.com/2003/12/19/politics/19MIDE.html?hp Anyway, Mozilla internally treats ISO-8859-1 as synonymous with Windows-1252 (when converting to Unicode) so that just setting ISO-8859-1 works. The cause of Jonathan's problem is not charset mismatch but the lack of fonts on his system covering characters like U+2014 (em dash). Installing some truetype fonts (or bitmap fonts with 'windows-1252') should help.
shanjian is no longer working on mozilla for 2 years and these bugs are still here. Mark them won't fix. If you want to reopen it, find a good owner first.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → WONTFIX
Mass Reassign Please excuse the spam
Assignee: shanjian → nobody
Mass Re-opening Bugs Frank Tang Closed on Wensday March 02 for no reason, all the spam is his fault feel free to tar and feather him
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Reassigning Franks old bugs to Jungshik Shin for triage - Sorry for spam
Assignee: nobody → jshin1987
Status: REOPENED → NEW
I have noticed this problem on about 4 websites all together, but when viewing these sites through Explorer for example I do not see this. I have never seen the string of characters, I usually just see the question marks replacing apostrophes,semi-colons,dashes and quotation marks. The attachment is too large I presume, here is the link. http://www.soulciti.com/forreel/review.php?id=25
QA Contact: amyy → i18n

We believe this bug is not relevant anymore. Please, reopen if needed.

Status: NEW → RESOLVED
Closed: 20 years ago4 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: