Closed Bug 110767 Opened 23 years ago Closed 23 years ago

Incorrect interpretation of "#!" in nsUnknownDecoder::DetermineContentType

Categories

(Core :: Networking, defect, P3)

x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla0.9.7

People

(Reporter: tenthumbs, Assigned: bzbarsky)

Details

Attachments

(1 file)

In nsUnknownDecoder::DetermineContentType, the code says:

 // If the buffer begins with "#!" or "%!" then it is a script of some
 // sort...

but that's incorrect.

What "#!" really means is that the text immediately following is the name
of some interpreter (big emphasis on interpreter). The file content can be
anything meaningful to the interpreter, including binary data.

All unix kernels read a small amount of an executable file. If the first
two characters are "#!" the the kernel parses up to the first newline or
the end of the buffer (whichever comes first). If it finds something that
looks like an interpreter, it rewrites the argv array and calls the
interpreter it found.

Mozilla needs to look past this. If it finds a "#!", it must check beyond
the first newline (or some fixed amount) before it decides an object is
text.
->bz
Assignee: neeti → bzbarsky
So what you're saying is that we should dump the check for "#!" completely and
just look for null chars like we do later in the code?
The code in question was simply moved from the original 'unknown content
sniffer' in the 4x codebase :-)

Personally, i think you are correct and we should just remove this check all
together...  if it turns out to be text/plain then great.  otherwise, we should
treat it as application/octet-stream and save it ...

-- rick
Attached patch Patch to fixSplinter Review
Actually, looking at the comment before that check I see what the issue is.  If
I have a per script that emits html, we want it to be text/plain, not
text/html....

This patch should fix things to be happy
bbaetz, would you review?
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla0.9.7
Why aren't we removing the #! check altogether?

Also, how many btyes could we be scanning?
------------------------------------------
#! /usr/bin/perl

print "Content-type: text/html\n\n";
print "<html><body>This is some HTML</body></html>"
------------------------------------------

We want to detect this snippet as text/plain, not text/html

We would be scanning at most 1024 bytes (that's the buffer size).  Note that if
I were to remove this check completely we would first search for various html
tags (not to quick) and then _still_ scan the 1024 bytes if no html tags are
found.  So perf-wise removal and my change are probably equivalent.
I don't see anything wrong with a special test in "#!" handling as long 
as mBufferLen is always big enough. You need 128 bytes for Linux. 
However, it does seem inefficient.

I also think you want to check for more control characters than just 
nulls.
 
Make that you need to check more than 128 bytes on Linux if the first  
128 are all text.
Comment on attachment 59329 [details] [diff] [review]
Patch to fix

Ah, ok. r=bbaetz
Attachment #59329 - Flags: review+
> I also think you want to check for more control characters than just 
> nulls.

Not really... those control characters could just be normal text characters in
plain text that's not in the ASCII encoding.  Even checking for nulls is a hack
-- two-byte encodings will test as binary.

We want to flag as text as many text documents as we can, in my opinion.  If
this causes a few binary documents to be treated as text, this is an acceptable
alternative... the user can still save them after they've displayed.
> We want to flag as text as many text documents as we can, in my 
> opinion.

An awful lot of users freak if garbage appears on their screens. Even
more do not realize they can save a file that way. Besides, has mozilla
in the past, or does it now, or may it in the future, do newline
conversions on text files. If so, your binary data is screwed. I think
you're wrong about text files.
Comment on attachment 59329 [details] [diff] [review]
Patch to fix

sr=rpotts@netscape.com
Attachment #59329 - Flags: superreview+
tenthumbs, if we fix text file stuff we should fix it not only here but also
later in the mime decoder... Please file a separate bug on that?
checked in
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: