Closed
Bug 121174
Opened 23 years ago
Closed 4 years ago
dashes in some web pages replaced with question marks
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
INVALID
mozilla1.2alpha
People
(Reporter: jonabbey, Assigned: jshin1987)
References
()
Details
(Keywords: intl)
Attachments
(5 files)
In build 2002012108 and over the last couple of weeks on Linux, I'm seeing
dashes on the New York Times web pages replaced with question marks. I have
seen this on some other web sites, but when I tried to create a simple test page
(http://www.burrow.org/~jonabbey/mozilla_bug.html), it didn't show the bug.
The URL at the New York Times will require free registration to access, but
username 'cypherpunk', password 'cypherpunk', may allow you to access the above
URL if the NYT hasn't caught onto this convention. ;-)
-- Jon
Comment 1•23 years ago
|
||
font bugs to intl.
Assignee: asa → yokoyama
Component: Browser-General → Internationalization
QA Contact: doronr → ruixu
Comment 2•23 years ago
|
||
ruixu: can you verify this?
cc'ing shanjian
Reporter:
Could you please let us know what is your Linux environment? e.g. Linux
version/language, your working locale, etc., and could you please attach a
screen shot for the problematic page? Thank you.
| Reporter | ||
Comment 4•23 years ago
|
||
I can do, but I'm not sure that I should. That is, if I look at the New York
Times' page with 'View Source', I see question marks in the raw html file.
Given that, I wouldn't expect this to be a font issue. Changing fonts didn't
help the situation. In addition, I have seen this both on my home system (Linux
2.2.18 on Athlon with NVidia GeForce2MX card running XFree86 4.1 with NVidia's
drivers) and on my work system (stock RedHat 7.1). My home system has the LANG
environment variable set to C, my work system has the LANG environment variable
set to en_US.
I'm reporting a problem with Mozilla on the basis that the NYT pages have
question marks in them where Netscape 4.74 on this Linux box shows dashes. I
have two guesses as to the etiology of this problem, given that I see the
question marks in the view source window as well as in the rendered page. The
first would be that something in Mozilla's networking layer is remapping the
dash characters for some reason. The second would be that the New York Times'
software is munging the dashes for some reason due to the Agent string that it
is getting.. i.e., that the NYT is transmitting question marks instead of dashes
because it doesn't know what to make of Mozilla's Agent string.
The second answer would seem more likely, except that I have seen the question
marks somewhere else as well. Unless the NYT is using some standard software
which is corrupting its data, the other page would argue against putting the
blame on the NYT.
If you can provide me with directions on how to change Mozilla's agent string, I
can try having Mozilla simulate Netscape 4.74's agent string and hopefully
either rule out or rule in the problem being on the New York Time's site. I was
hoping someone else could check out the URL I cited and check to see if they
were seeing the question marks or not.
I will post the screenshot anon.
| Reporter | ||
Comment 5•23 years ago
|
||
This screenshot shows the New York Times URL that I included in the initial bug
report on my home Linux system running build 2002012208. The question mark at
the very beginning of the text shows up as a dash when the same URL is viewed
through Netscape 4.74.
| Reporter | ||
Comment 6•23 years ago
|
||
Okay, here's another page that has question marks where I don't expect them.
http://firingsquad.gamers.com/features/spineless/. See the first paragraph
after 'Case in point: Licensing Restrictions'. Where we see EULA?s in place of
EULA's.
Now, I know that Netscape and Mozilla both have always declined to honor the
"smart quote" characters used in web pages authored with MS tools, and I assume
that this instance of question mark interpolation is a result of invalid HTML
authored by MS FrontPage or MS Word.
I point this out because I believe this sort of thing is what made me think that
I had seen another page with erroneous question mark interpolation. I do note
that the firingsquad page's inappropriate question marks do appear when I do a
view source on the page, much as with the NYT's pages.
Comment 7•23 years ago
|
||
Since the those kind of news web pages change so fast, I'm saving the page
which has problem.
Comment 8•23 years ago
|
||
When browsing this live web, I can see the "?" characters.
Very intresting, if I save this page locally and open it through Navigator,
then I won't see the "?".
Comment 9•23 years ago
|
||
the page will be displayed fine in this case.
Btw, I'm using linux RedHat7.1-JA, locale=ja_JP.eucjp
Reporter:
Can you try the attached html file and see if you still can see the "?",
thanks!
Comment 10•23 years ago
|
||
I realized the original:
http://firingsquad.gamers.com/features/spineless/
is a ".asp" file so I can see the "?" in that page but when I saved it as a
".html" format file, then lost some original data so that can not see the "?".
But I still can not reproduce the dashes are showing "?" in new york times page:
http://www.nytimes.com/2002/01/22/science/space/22BLAC.html
The dashes are displayed fine here.
Comment 11•23 years ago
|
||
Confirmed as New.
There is a word "high-energy" in the attachment 66087 [details], please note that this
dash mark cannot be changed to the question mark.
| Reporter | ||
Comment 12•23 years ago
|
||
Is the NYT page perhaps using some kind of non-standard 'm-dash'? I know that
TeX, etc., have special characters for dashes of varying lengths.
And, yes, when I view the NYT page today, using Linux build 2002012308 on
my work system, I do see a couple of simple dash characters instead of the
single ? mark that I saw before. From my home system, however, using the same
Linux build displaying on my work system's X display (slooowly), I do see the ?
mark from the NYT URL given, even after I clear my disk and memory caches and
force a reload of the page.
I'm going to file this, then try changing my LANG variable to C and try to visit
the page and see what I get.
| Reporter | ||
Comment 13•23 years ago
|
||
Okay, setting my LANG environment variable to C to match my home system's
setting didn't make a difference.
I do notice, however, that the '--' I see on my work system on that NYT page is
in actuality a single character which is rendered as a double-dash. Try selecting
half of the double-dash.. you can't.
On both systems, the font in use is adobe-times-iso8859-1. As I say, I tried
having my home system display the page on my work X display and still got the
question mark. Perhaps it is a matter of my profile. I'll move it aside and
try again.
| Reporter | ||
Comment 14•23 years ago
|
||
Okay, I tried visiting the NYT page from home with a new profile, using my work
display for the X server. I saw the double-dash properly on the NYT page. I
will have to investigate more when I get home to see if there is some font on my
work display that my home system doesn't have, or if the newly created profile
will properly show the double-dash on my home system.
Perhaps this has something to do with Unicode fonting, I don't know.
I do notice that if I see the double-dash properly rendered in the page, I will
also see the double-dash properly rendered in the 'view source' window.
Comment 17•23 years ago
|
||
I had the same issue on WinXP with every build I have used .99 - 1.2a. I fixed
it today by changing the default character coding to 'western (windows-1252).
Don't know if that is helpful in diagnosing the source of the problem.
Comment 18•23 years ago
|
||
I can't believe this is considered a bug.
Given a specified encoding, only those characters that are part of that encoding
should be rendered. The rest *should* be rendered as question marks or something.
The given page has no encoding specified, which means whatever default coding
the user has specified will be used. This is usually ISO-8859-1, which doesn't
contain the characters em-dash, en-dash, double quotation mark and so on. Those
characters are part of the General Punctuation block in Unicode, and if one
wants to use them in a page with ISO-8859-1 encoding, one *has* to escape them
using entities (&emdash;) or Unicode numerals (’).
This is an envangelism issue, IMNSHO. If NYT has so lousy web designers, it is
their problem, not Mozilla's. They need to specify an encoding on their pages,
and they need to escape special characters unless they are prepared to go native
Unicode. From such big and wealthy organizations as NYT, one can and should
expect better. Fire those "web designers" and hire some fresh talents with a
rigid web design foundation.
Mozilla, on the other hands, need to concentrate on *real* bugs and stop working
on exceptions and other quirks mode gibberish that only serve to bloat and slow
down the product.
Damn!
| Reporter | ||
Comment 19•23 years ago
|
||
So why not just make the default western encoding for Mozilla ISO 8859-15?
Wouldn't that take care of the problem? It seems to have for me.
Comment 20•22 years ago
|
||
I'm running Mozilla 1.3.1 on RH 8 on a Dell Inspirion (single boot - there's
nothing Microsoft on this machine) and I get this problem at several sites -
NYT, as noted, as well as Wall Street Journal and many others.
Comment 21•21 years ago
|
||
*** Bug 209624 has been marked as a duplicate of this bug. ***
Comment 22•21 years ago
|
||
*** Bug 227325 has been marked as a duplicate of this bug. ***
| Assignee | ||
Comment 23•21 years ago
|
||
re comment #15: I wrote a few times to NYT to fix this issue, but they never
replied. Currently, they speciy 'iso-8859-1' in their first page
(http://www.nytimes.com), but they don't specify it elsewhere. For instance, the
following page is in Windows-1252, but neither ISO-8859-1 nor Windows-1252 is
specified (http header and meta tag).
http://www.nytimes.com/2003/12/19/politics/19MIDE.html?hp
Anyway, Mozilla internally treats ISO-8859-1 as synonymous with Windows-1252
(when converting to Unicode) so that just setting ISO-8859-1 works. The cause of
Jonathan's problem is not charset mismatch but the lack of fonts on his system
covering characters like U+2014 (em dash). Installing some truetype fonts (or
bitmap fonts with 'windows-1252') should help.
Comment 24•20 years ago
|
||
shanjian is no longer working on mozilla for 2 years and these bugs are still
here. Mark them won't fix. If you want to reopen it, find a good owner first.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → WONTFIX
Comment 26•20 years ago
|
||
Mass Re-opening Bugs Frank Tang Closed on Wensday March 02 for no reason, all
the spam is his fault feel free to tar and feather him
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Comment 27•20 years ago
|
||
Reassigning Franks old bugs to Jungshik Shin for triage - Sorry for spam
Assignee: nobody → jshin1987
Status: REOPENED → NEW
Comment 28•17 years ago
|
||
I have noticed this problem on about 4 websites all together, but when viewing these sites through Explorer for example I do not see this. I have never seen the string of characters, I usually just see the question marks replacing apostrophes,semi-colons,dashes and quotation marks. The attachment is too large I presume, here is the link.
http://www.soulciti.com/forreel/review.php?id=25
Updated•16 years ago
|
QA Contact: amyy → i18n
Comment 29•4 years ago
|
||
We believe this bug is not relevant anymore. Please, reopen if needed.
Status: NEW → RESOLVED
Closed: 20 years ago → 4 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•