Closed Bug 275650 Opened 20 years ago Closed 17 years ago

grammatical-evolution.org - Mozilla fails to render page, just shows raw HTML, I.E. renders page OK

Categories

(Tech Evangelism Graveyard :: English US, defect)

x86
Windows 98
defect
Not set
major

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: xanthian, Unassigned)

References

()

Details

Attachments

(4 files)

User-Agent:       Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.8a6) Gecko/20041127
Build Identifier: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.8a6) Gecko/20041127

For the subject URL, the Mozilla browser begins to render the page, but
quickly reverts to just displaying the raw HTML code. The same page
displays appropriately rendered when Internet Explorer is used instead.

Reloading the page several times had no different effect.

Reproducible: Always

Steps to Reproduce:
1. Open URL
2.
3.

Actual Results:  
After a brief flash of the header of the rendered page, the browser
reverts to just displaying the HTML. A brief glance at the raw
code shows no obvious problems.

Expected Results:  
Rendered the page or showed an error message saying why it could not
do so.


Nothing fancy going on, this is a fairly new download, with a theme that
has caused no problems in months of use.
WFM 20041220 PC/WinXP
WFM Mozilla 1.8a6 2004121205 on WinNT4
Version: unspecified → Trunk
Also cannot reproduce in a current trunk build on Linux....
(In reply to comment #3)
> Also cannot reproduce in a current trunk build on Linux....

Okay, I've since upgraded to nightly build
2005011605 on MS-Win98SE, and the bug is still
reproducable for me.

I wish I could do a movie for you; the page starts
to render, renders the visible window full (and
presumably keeps parsing and rendering "offscreen"),
the rendering finds something it cannot digest, and
the page is repainted top down with the raw HTML
text.  Attempts later to reload get just the raw
HTML seen, probably as I'd expect if I understood
the caching better.

This in a freshly rebooted system in which I'd only
done my DSL connection, and opened KevTerm to read
the email copy of your remark, before trying this
test, so I don't think the odds that something is
corrupted in fragile Win98 before this test run are
too high, given that the error recurs with different
startups.

I'm very suspicious that there is indeed something
broken in the HTML on that page, but I've lost track
of (or perhaps invented the existence of) the W3C
page that proofreads HTML for you, so I don't have
an automated way to "parse it for errors", that
would indicate the location of the error, and the
page is a bit big to proofread accurately by eye.

Of course, Mozilla should fail more gently than this
even if the HTML is in error, I'd think, preferably
with an error note, either separately via a message
box to the user or else embedded in the rendered
page.

Let me know if there's more info I can supply.

FWIW

xanthian.

Kent, do you have HTTP pipelining turned on, by any chance?

As for the W3C HTML validator, it's at http://validator.w3.org/ but I agree that
we shouldn't be displaying garbage even on somewhat invalid HTML... ;)
(In reply to comment #5)

Hi, Boris.

> Kent, do you have HTTP pipelining turned on, by
> any chance?

Not unless Mozilla comes with it enabled "out of the
box"; I had to go read your FAQ even to know what
the term meant. I surely haven't played with
wherever it is controlled.

> As for the W3C HTML validator, it's at
> http://validator.w3.org/

Thanks for that, at least I didn't imagine
it out of whole cloth.

> but I agree that we shouldn't be displaying
> garbage even on somewhat invalid HTML... ;)

Wouldn't that be an ideal world, though!

If you've read Hofstader's Goedel, Escher, Bach,
and the record player parable there, you know what
the odds of that ever happening look like.

xanthian.

OK.

Kent, if you save the file to disk (using wget or another browser or something),
can Mozilla load it from disk correctly?

If it can, could you do an HTTP log of the failing pageload per the instructions
at http://www.mozilla.org/projects/netlib/http/http-debugging.html and attach it
to this bug?
Per request from Boris, per instructions at link he provided.
Just a screen capture to show this bug is "real", since it
is a "WFM" on some testers' systems, but not on my
MS-Win98SE system.
(In reply to comment #7)
> OK.

> Kent, if you save the file to disk (using wget or
> another browser or something), can Mozilla load it
> from disk correctly?

I downloaded a local copy with wget, and modulo that
the image local references no longer worked for the
two images on the page, the page rendered correctly
(to the eye) from my local copy.

The two images are located toward the top of the
page, and though when rendering from the remote copy
is failing, events happen too fast (a fraction of a
second) to see whether they are displayed, I suspect
they are not involved in the failure, simply because
successful rendering goes well past that location
before failing. That may not be a correct inference
on my part, though.

To make sure that the problem hadn't changed out
from under my testing, I also then went back and
confirmed that the failure still occurred as
previously described, when the remote copy was
accessed, and such is the case.

> If it can, could you do an HTTP log of the failing
> pageload per the instructions at
> http://www.mozilla.org/projects/netlib/http/http-debugging.html
> and attach it to this bug?

Okay, I have done this, cutting and pasting from a
non-mozilla copy of the instructions to assure I got
it correct (kudos to the author, it worked as
described) and will attach log.txt before adding
this comment.

The "instrumented" version didn't behave exactly the
same, though, there was no "flash of a rendered
version", only the final display of the raw HTML. It
would be good if whoever reviews log.txt can check
that some local cached/already (not)rendered version
was not the source of the different behavior, if I
said that so it makes sense.

Since this bug doesn't reproduce for everyone, I am
also attaching a screen capture of the failure to
confirm its reality.

And by the way, with the logging no longer
occurring, when I made that screen capture, again
there was no "flash of rendered text" as in the
first access attempt, just a "straight to the raw
HTML" display.

HTH

xanthian.
The log shows us just reading from the cache....  Does clearing cache and then
loading the site show the problem too?
(In reply to comment #5)

> As for the W3C HTML validator, it's at
> http://validator.w3.org/

I ran that validator on the failing page, it told me
two things:

    No Character Encoding Found! Falling back to
    UTF-8.

    I was not able to extract a character encoding
    labeling from any of the valid sources for such
    information. Without encoding information it is
    impossible to reliably validate the document.
    I'm falling back to the "UTF-8" encoding and
    will attempt to perform the validation, but this
    is likely to fail for all non-trivial documents.

and

    Sorry, I am unable to validate this document
    because on line 176  it contained one or more
    bytes that I cannot interpret as utf-8 (in other
    words, the bytes found are not valid values in
    the specified Character Encoding). Please check
    both the content of the file and the character
    encoding indication.

A look at the indicated line of my local copy with
vim() shows nothing scarier than a couple of raw
umlauted letter "a"s (which vim() is happy to render
as intended using whatever glyph code sheet is its
default). There may be something invisible, but more
than likely, this is what choked Mozilla. That also
is far enough down the document to be out of sight
in the original partial rendering attempt, so it is
consistent with the evidence of the failure.

Granted, the page author should have used HTML
enities rather than the raw non-ASCII characters
from his/her European keyboard, is this an expected
"Mozilla killer"?

FWIW

xanthian.

Re: comment 11

That's what I was afraid of; let me try again after
some puttering around to get back to a test situation.

xanthian.
Re: comment 11

But before I go...

Even read out of cache, the rendering fails, while when read from
a local copy, it succeeds. Why the difference?

Moreover, the failure mode changes when read from cache.

Please tell me you don't cache the already (failed to be-)rendered
version!

My comments online and off about bug 271239 are just dying to
re-erupt.
We cache the exact bytes the server sent.  So if the server sent "bogus" bytes
once, we'd cache it till the expiration time for that data.
(In reply to comment #14)
> We cache the exact bytes the server sent.  So if the server sent "bogus" bytes
> once, we'd cache it till the expiration time for that data.

Well, that makes me quite a bit happier, thanks.

xanthian.
This looks much nicer, being almost six times as long, but
the failure mode still omitted the initial partial rendering,
so something remains changed from the failure under a completely
clean start; was
   edit=>preferences=>advanced=>cache=>clear cache;OK
insufficient?

Sorry for the delay returning this, but something, perhaps
Mozilla, since that's what I was using at the time, forced
me to reboot [twice, after the first reboot neither Mozilla
nor my DSL connection would come up]. Playing with Mozilla
can be hazardous to one's free time.
No, clearing the cache should be sufficient (and this shows HTTP requests being
made).  Looks like we're getting ok data from the site, though.

If I may ask, what does View > Character Coding say for you on this page?  If it
doesn't show ISO-8859-1, what are you charset autodetect settings?
(In reply to comment #17)

> No, clearing the cache should be sufficient (and
> this shows HTTP requests being made).  Looks like
> we're getting ok data from the site, though.

Then why no "flash of rendered version" even after I
did that? Shouldn't the behavior have reverted to
"first time seen" behavior?

> If I may ask, what does View => Character Coding
> say for you on this page?  If it doesn't show
> ISO-8859-1, what are your charset autodetect
> settings?

If I understand what I'm seeing, the button with
center filled is opposite "Western Windows 1252",
and the filled button in the charset autodetect menu
is (!!!) Japanese! I didn't do that, I don't read
Japanese, and no one has been in the house since I
last replaced my current nightly build, but some
few weeks ago a Japanese friend who visits regularly
installed into the MS-Windows environment Japanese
character ability so he could use his Hotmail/Yahoo!
Mail Japanese account and read the text here. Would
that have changed a browser setting, or would he
have been instructed to change a browser setting?
(why), and would that have lingered (how, though I
probably can guess in a general sense), and would
that have led to other pages being broken on arrival
despite that they were _not_ Japanese (oops)?

Remember also the one complaint from the W3C
validator that there wasn't a source of character
set information, which is independent of my whole
browser setup (if that complaint makes sense at
all), and thus shouldn't depend on my settings,
though perhaps my settings are indeed making that
lack unsurvivable for Mozilla's rendering pass.

Curiouser and curiouser. Nice query, though, I'd
_never_ have gone looking there.

Oh, and if I set that to something else, what should
it be, and will doing so break his ability to read
his Kanji(?) email?

xanthian.
(In reply to comment #18)

Okay, not to sit here passive, I changed the settings to

  view => character encoding => autodetect => off

cleared the cache, reloaded, and the problem remained;

then I additionally changed

  view => character encoding => Western (ISO-8859-1)

again cleared the cache, reloaded, and the page now
displayed correctly! Thus, the settings you chose were
indeed the immediate cause of the symptoms. Now, is
that purely a "user error", not a bug, or should
Mozilla:

1) Have rendered the page correctly anyway, or
2) Failed to render the page correctly, but put up a
   useful message explaining why?

If a "user error", not a bug, how does the user who
needs dual language use avoid that error?

Since Mozilla with those settings has been rendering
other pages without problems, I assume that the
non-ASCII raw text on that page is interacting with
the character encoding settings in an unhappy way.

Should that be the case? Should that case elicit a
"check your character encoding settings" prompt?

HTH

xanthian.

(In reply to comment #12)

> A look at the indicated line of my local copy with
> vim() shows nothing scarier than a couple of raw
> umlauted letter "a"s (which vim() is happy to render
> as intended using whatever glyph code sheet is its
> default).

I went back to check, and the word "Jyväskylä" with the
two raw text umlauted "a"s was indeed correctly rendered
on the page, specifically.

HTH

xanthian.
> or would he have been instructed to change a browser setting?

He almost certainly changed the browser setting, yes.

So what's happening here is that the page says absolutely nothing about the
encoding it's in.  Since autodetect is enabled, and set to Japanese, when we hit
non-ASCII bytes we try to guess which exact Japanese encoding the page is in by
looking at the non-ASCII byte pattern and picking the Japanese encoding that is
most likely to have that byte pattern.  Then we reparse the page from the
beginning (this is the "flash" thing you saw), in this new encoding.

It seems that the particular encoding in question (the one that was most closely
resembled by the decidedly non-Japanese text on the page) leads to garbage
display in this case, most likely because a '<' char somewhere is treated as
part of a character in this encoding or something along those lines.

I'm afraid this is in fact user error -- the user has told us to try decoding
all pages as Japanese unless the pages say otherwise, and we're doing our best
to do that...

ccing some intl folks to confirm, though.
I'll look at this more closely in a little while, but I notice right off that
the server is sending a header "Content-Type: text/plain", which can't be helping :)
Hmm. this is very odd. The page in question has only two 8bit octets (with MSB
set), both of which are 0xE4. They're followed by non-8bit octets('s' and ','),
which are invalid in ISO-2022-JP and EUC-JP. Because of that, they're rendered
as 'an invalid char' when the encoding is set to EUC-JP or ISO-2022-JP. In
Shift_JIS, they're valid and rendered as Japanese Kanjis as expected. 

With the cache cleared and Japanese autodetector on, however, I could reproduce
the problem. If earlier on it was detected as EUC-JP / ISO-2022-JP, 0xE4 + 's'
and 0xE4 + '.' should be just rendered as 'an invalid char'. If it was
Shift_JIS, everything is valid...

Simon, how did you get 'text/plain' response? attachment 171927 [details] and the result I
got at 'http://websniffer.org' have 'Content-Type: text/html'. If the server
'sometimes' emits 'text/plain' (although most times, it emits 'text/html'), that
may explain what we have here. 

(In reply to comment #23)
> If the server
> 'sometimes' emits 'text/plain' (although most times, it emits 'text/html'), that
> may explain what we have here. 

This seems to be the case. First load of
http://sniffuri.org/view.cgi?url=http%3A%2F%2Fwww.grammatical-evolution.org%2Fpubs.html
gives me text/html in the headers, but reloading gives text/plain.

I'm guessing this is some misconfiguration on the server which is being
triggered by the reload that can occur with charset auto-detection.
Ah, yes.

Over to evangelism.
Assignee: general → english-us
Status: UNCONFIRMED → NEW
Component: General → English US
Ever confirmed: true
Product: Mozilla Application Suite → Tech Evangelism
QA Contact: general → english-us
Summary: Mozilla fails to render page, just shows raw HTML, I.E. renders page OK → grammatical-evolution.org - Mozilla fails to render page, just shows raw HTML, I.E. renders page OK
Version: Trunk → unspecified
(In reply to comment #25)
> Ah, yes.
> Over to evangelism.

Well, thanks for that at least, I was expecting an INVALID!

So despite it being a user-self-inflicted problem, it's
still interesting. That's good, Boris, all your efforts
haven't been a waste.

xanthian.
(In reply to comment #21)
> So what's happening here is that the page says
> absolutely nothing about the encoding it's in.
> Since autodetect is enabled, and set to Japanese,
> when we hit non-ASCII bytes we try to guess which
> exact Japanese encoding the page is in by looking
> at the non-ASCII byte pattern and picking the
> Japanese encoding that is most likely to have that
> byte pattern.  Then we reparse the page from the
> beginning (this is the "flash" thing you saw), in
> this new encoding.

Hmm, later comments indicate that there is also some
interaction with the server's claim for what it is
sending, so once it stops claiming to be sending
text/html, and starts claiming to be sending
text/plain, the initial attempt the render (the
"flash") doesn't happen any more. It would be of
some interest to see whether there is _ever_ an
attempt to render when the file is first downloaded
by some non-Mozilla means, and then accessed from
local disk, there being then no "server" involved,
and since as noticed the file contains no internal
typifying of its own contents.

> It seems that the particular encoding in question
> (the one that was most closely resembled by the
> decidedly non-Japanese text on the page) leads to
> garbage display in this case, most likely because

That's a bit misleading, no biggie, but it isn't
displaying "garbage" (except possibly offscreen
where I don't see it, and depending as noted in
another comment perhaps on which flavor of
"Japanese" encoding it guesses it is seeing).
Instead, what ends up finally on the screen is
just the raw text of the HTML code, no "garbage"
at all, even where the two offending umlauted "a"s
are.

> a '<' char somewhere is treated as part of a
> character in this encoding or something along
> those lines.

Such seems to be the case.

> I'm afraid this is in fact user error -- the user
> has told us to try decoding all pages as Japanese
> unless the pages say otherwise, and we're doing
> our best to do that...

Sigh. Sorry for the inconvenience, then.

I'm still a bit promoting that Mozilla should
comment on what went wrong, though, when it decided
not to render the page at all, just display the raw
HTML, after detecting from the contents that the
page _was_ HTML (if such is actually the case), so
the user has a clue. Something like

    "You told me to interpret this web page as
    [Japanese, e.g.], but then when I did that,
    some of the characters came out invalid in
    that encoding, so I'm just displaying the
    page uninterpreted as a fallback."

would have been nice, let the user know _why_ what
happened had happened, pointed the user mildly toward
a fix ("what do you _mean_, 'Japanese'?!?") and
avoided this bug report. Mozilla going into a
failure mode fallback silently isn't as helpful.

One might want to condition such warnings on a
"verbose" preferences setting, but there's then an
issue of how to make the naive user aware that
enabling that setting is a good first start when
Mozilla does "spooky stuff".

> ccing some intl folks to confirm, though.

Thanks, since that seems to have raised other
issues among folks more familiar with the
internationalization pitfalls.

xanthian.


>    "You told me to interpret this web page as
>    [Japanese, e.g.], but then when I did that,
>    some of the characters came out invalid in
>    that encoding, so I'm just displaying the
>    page uninterpreted as a fallback."

But that's not what happened. Mozilla does not do that. it's probably the fact
that the server sometimes sends the page as text/plain, as mentioned above...
re comment #27
You're misinterpreting what we figured out. (comment #23 and comment #24). bz's
analysis in comment #21 was not quite right.  It's NOT invalid octets that makes
mozilla render html as 'text/plain'. If Mozilla did, it would be a very serious
bug and we should fix it.
(In reply to comment #28)

> But that's not what happened. Mozilla does not do
> that. it's probably the fact that the server
> sometimes sends the page as text/plain, as
> mentioned above...

Well, no, that wouldn't explain why the failure
still occurred when the HTML was being read from a
local copy I downloaded with wget() per Boris'
request, still showing me the raw HTML, just without
the prior rendered flash. Whatever the problem is,
the behavior of showing the raw HTML is associated
with the file contents, I just turned the "Japanese"
stuff back on and checked other simple and complex
local HTML pages that don't have a text type
declaration, and they render without problems, as
expected.

The behavior of _flashing first_ does seem to be
"server" related, since I cannot duplicate that at
all from the local copy of the same web page.

FWIW

xanthian.

I can't reproduce the problem if the file (pubs.html) is on my local disk or on
my server ( http://jshin.net/moztest/275650.html ).
(In reply to comment #31)
> Created an attachment (id=172079) [details]
> the html file in question
> 
> I can't reproduce the problem if the file (pubs.html) is on my local disk or on
> my server ( http://jshin.net/moztest/275650.html ). 

ditto. and the URL

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070222 SeaMonkey/1.1.1 XpcomViewer/0.8.9
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → WORKSFORME
Product: Tech Evangelism → Tech Evangelism Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: