Closed Bug 284688 Opened 20 years ago Closed 20 years ago

mozilla ignores mime type sent by server

Categories

(SeaMonkey :: General, defect)

x86
FreeBSD
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: neuhauser, Unassigned)

Details

User-Agent:       Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.8a6) Gecko/20050131
Build Identifier: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.8a6) Gecko/20050131

requesting http://codex.sigpipe.cz/FreeBSD/ports/sysutils/pecl-statgrab.shar
summons the "what should I do?" dialog saying, "The file "..." is of type
application/x-shar (...)", BUT the server says the file is text/plain:

roman@isis ~ 1002:1 > telnet codex 80                                         
Trying 81.95.102.106...
Connected to codex.sigpipe.cz.
Escape character is '^]'.
HEAD /FreeBSD/ports/sysutils/pecl-statgrab.shar HTTP/1.1
Host: codex.sigpipe.cz

HTTP/1.1 200 OK
Date: Thu, 03 Mar 2005 23:29:16 GMT
Server: Apache
Last-Modified: Tue, 22 Feb 2005 23:49:39 GMT
ETag: "45b8f4-1456-3bdd2ac0"
Accept-Ranges: bytes
Content-Length: 5206
Content-Type: text/plain

Connection closed by foreign host.
roman@isis ~ 1002:1 > 


Reproducible: Always

Steps to Reproduce:
1. http://codex.sigpipe.cz/FreeBSD/ports/sysutils/pecl-statgrab.shar
2.
3.

Actual Results:  
see the "details" section

Expected Results:  
see the "details" section
yeah... it does that for text/plain if the data doesn't look like text... see
http://www.mozilla.org/docs/web-developer/mimetypes.html#http
Yep, we purposefuly do this for the case when the server is Apache, to work
around a bug in the default Apache configuration....  See
http://issues.apache.org/bugzilla/show_bug.cgi?id=13986 and
http://issues.apache.org/bugzilla/show_bug.cgi?id=14095
Status: UNCONFIRMED → RESOLVED
Closed: 20 years ago
Resolution: --- → WONTFIX
Well, that's... very Internet Explorer-like.

Now, since I don't enjoy this misbehavior, and my Apache is perfectly ok, how do
I turn this on-purpose bug in Mozilla off? I don't particularly care if it's
broken by default in this regard, just give me a knob I can turn to make it
behave as it should.
> Well, that's... very Internet Explorer-like.

Being insulting is a good way to get people to deal with you, eh?

Consider the situation here.  We spent several years trying to convince the
developers of what is by far the most popular web server around to fix a major
bug in their server (please do see the Apache bugs I cited).  When they made it
abundantly clear that they had no plans to ever fix it, we implemented a
workaround (just like we implement workarounds for various other bugs in HTTP
implementations, including bugs in the content-disposition handling in Apache,
bugs in pipelining support in various servers, bugs in Content-Location support,
etc).  This is quite different from what IE does.

> and my Apache is perfectly ok

Actually, it's not clear to me that it is.  The definition of the text/plain
type says:

   Note that the control characters including DEL (0-31, 127) have no defined
   meaning apart from the combination CRLF (ASCII values 13 and 10)
   indicating a new line.  Two of the characters have de facto meanings
   in wide use: FF (12) often means "start subsequent text on the
   beginning of a new page"; and TAB or HT (9) often (though not always)
   means "move the cursor to the next available column after the current
   position where the column number is a multiple of 8 (counting the
   first column as column 0)." Apart from this, any use of the control
   characters or DEL in a body must be part of a private agreement
   between the sender and recipient.  Such private agreements are
   discouraged ...

(see http://www.faqs.org/rfcs/rfc1521.html).  Note that if your data is being
flagged as "not plaintext" by Mozilla, then your server in fact sending such
control characters in what it claims to be text/plain data...

> how do I turn this on-purpose bug in Mozilla off?

On the server side, send a charset that's not ISO-8859-1.  On the client side,
install an extension (which would need to be written, but shouldn't take too
much work), which overrides the contractid for the text/nontext detector with a
class that always detects as text.
or you could hack the source code to comment out the text/plain sniffing
(nsURILoader.cpp)
(In reply to comment #4)
> > Well, that's... very Internet Explorer-like.
> 
> Being insulting is a good way to get people to deal with you, eh?

Correct. But ignoring the CT sent by the server has traditionally been an
iexplore's "feature", hasn't it?

> Consider the situation here.  We spent several years trying to convince the
> developers of what is by far the most popular web server around to fix a major
> bug in their server (please do see the Apache bugs I cited).  When they made it
> abundantly clear that they had no plans to ever fix it, we implemented a
> workaround (just like we implement workarounds for various other bugs in HTTP
> implementations, including bugs in the content-disposition handling in Apache,
> bugs in pipelining support in various servers, bugs in Content-Location support,
> etc).

My observation is that you traded a glitch that's easy to bypass by choosing
"Save link target as..." for a glitch that the user can only get around by
writing an extension. I just can't get myself to see this as an improvement.

>  This is quite different from what IE does.

Mozilla ignores the CT header, guesses what it should do with the document on
its own without providing a user-accessible way to change that. How is it
different? This is honest question, I don't see any difference.

> > and my Apache is perfectly ok
> 
> Actually, it's not clear to me that it is.  The definition of the text/plain
> type says:
> 
>    Note that the control characters including DEL (0-31, 127) have no defined
>    meaning apart from the combination CRLF (ASCII values 13 and 10)
>    indicating a new line.  Two of the characters have de facto meanings
>    in wide use: FF (12) often means "start subsequent text on the
>    beginning of a new page"; and TAB or HT (9) often (though not always)
>    means "move the cursor to the next available column after the current
>    position where the column number is a multiple of 8 (counting the
>    first column as column 0)." Apart from this, any use of the control
>    characters or DEL in a body must be part of a private agreement
>    between the sender and recipient.  Such private agreements are
>    discouraged ...
> 
> (see http://www.faqs.org/rfcs/rfc1521.html).  Note that if your data is being
> flagged as "not plaintext" by Mozilla, then your server in fact sending such
> control characters in what it claims to be text/plain data...

It contains TABs. While I agree that tabulator is a problematic character,
refusing to render a document which contains a TAB is questionable. At the very
least, Mozilla should have provided a preferences setting.

Hm, TABs in
http://codex.sigpipe.cz/FreeBSD/ports/sysutils/pecl-statgrab/Makefile *don't*
prevent Mozilla from rendering the document as text/plain, does the hack contain
an exception for Makefiles? Or rather, which part of the Help contains a
description of this? Searching for "apache" or "text/plain" gives no matching items.

> > how do I turn this on-purpose bug in Mozilla off?
> 
> On the server side, send a charset that's not ISO-8859-1.  On the client side,
> install an extension (which would need to be written, but shouldn't take too
> much work), which overrides the contractid for the text/nontext detector with a
> class that always detects as text.

I think I'll take the route described in comment #5.

Not sure if you care or if this is the best forum, but here's my portion of
"customer feedback":

I'm seriously disturbed by recent developments in Mozilla, it resembles Internet
Explorer more and more.

I have probably stretched the happy season by avoiding Firefox, but Mozilla
1.8(a6) shoved a major behavioral change down my throat by switching to a
MSWindows-style keyboard shortcut scheme (and I'm not alone who does not enjoy
this), which means I'm now left without an acceptable browser.

I wouldn't be such a PITA to deal with here if there were a viable alternative;
I would have just stopped using Mozilla, thanks for the fine four years; but,
Mozilla being the only usable (for me) graphical, js-enabled browser on FreeBSD,
I'm now locked in with gradually alienating software.

So, that's about it, folks. I just wanted you to know that using recent Mozilla
versions hurts, and I hope to be a former Mozilla user really soon.
no, TAB does not trigger non-text detection. there must be a different character
in the file too.

> MSWindows-style keyboard shortcut scheme

hm? which keyboard shortcuts changed recently?
(In reply to comment #7)
> no, TAB does not trigger non-text detection. there must be a different character
> in the file too.

doesn't seem to be the case:

roman@isis ~/tmp 1045:0 > fetch
http://codex.sigpipe.cz/FreeBSD/ports/sysutils/pecl-statgrab.shar              
                  
pecl-statgrab.shar                            100% of 5206  B  153 kBps
roman@isis ~/tmp 1046:0 > perl -ne 'if (/.*([^[:print:]\t\n\040]).*/)
{printf("%d: %s (%s)\n", $., $1, $_);}' < pecl-statgrab.shar  
roman@isis ~/tmp 1047:0 > 

> > MSWindows-style keyboard shortcut scheme
> 
> hm? which keyboard shortcuts changed recently?

^U, ^H, ^W etc. in the location bar, text inputs, textareas, etc.
also note that Mozilla displays http://svn.collab.net/repos/svn/trunk/HACKING
just fine even though the document contains ^L (pagebreak) characters, is served
by Apache, and is declared text/plain, ISO-8859-1:

roman@isis ~/tmp 1049:2 > curl -I http://svn.collab.net/repos/svn/trunk/HACKING
HTTP/1.1 200 OK
Date: Sun, 06 Mar 2005 18:08:39 GMT
Server: Apache/2.0.50 (Unix) mod_ssl/2.0.50 OpenSSL/0.9.6 DAV/2 SVN/1.1.3
ETag: "13268//trunk/HACKING"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Type: text/plain; charset=ISO-8859-1

roman@isis ~/tmp 1050:0 > 

At this moment, I wonder just *how* does Mozilla decide on what it will do. Is
there a description of the algorithm somewhere?
(In reply to comment #8)
> http://codex.sigpipe.cz/FreeBSD/ports/sysutils/pecl-statgrab.shar             

I don't get the helper app dialog for that file...

> ^U, ^H, ^W etc. in the location bar, text inputs, textareas, etc.

ah - just set your gnome prefs to use emacs shortcuts.

(In reply to comment #9)
> At this moment, I wonder just *how* does Mozilla decide on what it will do. Is
> there a description of the algorithm somewhere?

http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#526
is the source code for it. ^L is ascii code 12 (\f) and moz treats that as a
text character.

A description of the algorithm can also be found at
http://www.mozilla.org/docs/web-developer/mimetypes.html#http which I mentioned
in comment 1; maybe that document should more clearly spell out which characters
are text characters...
(In reply to comment #10)
> (In reply to comment #8)
> > http://codex.sigpipe.cz/FreeBSD/ports/sysutils/pecl-statgrab.shar             
> 
> I don't get the helper app dialog for that file...

Hm, I don't now either. The web server's admin has probably made some changes in
the meantime... But the headers are identical. WTF!?

roman@isis ~ 1001:0 > telnet codex 80
Trying 81.95.102.106...
Connected to codex.sigpipe.cz.
Escape character is '^]'.
HEAD /FreeBSD/ports/sysutils/pecl-statgrab.shar HTTP/1.1
Host: codex.sigpipe.cz

HTTP/1.1 200 OK
Date: Mon, 07 Mar 2005 08:37:09 GMT
Server: Apache
Last-Modified: Tue, 22 Feb 2005 23:49:39 GMT
ETag: "45b8f4-1456-3bdd2ac0"
Accept-Ranges: bytes
Content-Length: 5206
Content-Type: text/plain

Connection closed by foreign host.

> > ^U, ^H, ^W etc. in the location bar, text inputs, textareas, etc.
> 
> ah - just set your gnome prefs to use emacs shortcuts.

I don't use gnome, that's the point. But that's OT here.

> (In reply to comment #9)
> > At this moment, I wonder just *how* does Mozilla decide on what it will do. Is
> > there a description of the algorithm somewhere?
> 
>
http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#526
> is the source code for it. ^L is ascii code 12 (\f) and moz treats that as a
> text character.
> 
> A description of the algorithm can also be found at
> http://www.mozilla.org/docs/web-developer/mimetypes.html#http which I mentioned
> in comment 1; maybe that document should more clearly spell out which characters
> are text characters...

No, "Text bytes are 9-13, 27, and 31-255." is sufficient. I wasn't sure it was
all there were to it, hence the question.

Anyway, since Mozilla doesn't display the behavior anymore and I don't know what
has changed, I'm leaving this for now. Perhaps the browser needs to run some
time to gather sufficient amount of entropy. (I had the machine, and probably
the browser as well, up a few days.) Should I return if the problem reappears?
Maybe someone could help me through a bit of debugging.
> I don't use gnome, that's the point. But that's OT here.

Then use the GTK1 Mozilla builds -- the GTK2 ones are very explicitly trying to
be a GNOME application.

If this problem reappears, say something, please.  We shouldn't be flagging that
file as non-text, and if we are something is badly broken...  When it happens,
an HTTP log (per
http://www.mozilla.org/projects/netlib/http/http-debugging.html) would probably
be the first place to start.
You need to log in before you can comment on or make changes to this bug.