Closed Bug 110767 Opened 23 years ago Closed 23 years ago

Incorrect interpretation of "#!" in nsUnknownDecoder::DetermineContentType

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla0.9.7

People

(Reporter: tenthumbs, Assigned: bzbarsky)

Details

Attachments

(1 file)

Patch to fix 23 years ago Boris Zbarsky [:bzbarsky] 1.30 KB, patch	bbaetz : review+ rpotts : superreview+	Details \| Diff \| Splinter Review

tenthumbs

Reporter

Description

•

23 years ago

In nsUnknownDecoder::DetermineContentType, the code says:

 // If the buffer begins with "#!" or "%!" then it is a script of some
 // sort...

but that's incorrect.

What "#!" really means is that the text immediately following is the name
of some interpreter (big emphasis on interpreter). The file content can be
anything meaningful to the interpreter, including binary data.

All unix kernels read a small amount of an executable file. If the first
two characters are "#!" the the kernel parses up to the first newline or
the end of the buffer (whichever comes first). If it finds something that
looks like an interpreter, it rewrites the argv array and calls the
interpreter it found.

Mozilla needs to look past this. If it finds a "#!", it must check beyond
the first newline (or some fixed amount) before it decides an object is
text.

Bradley Baetz (:bbaetz)

Comment 1

•

23 years ago

->bz

Assignee: neeti → bzbarsky

Boris Zbarsky [:bzbarsky]

Assignee

Comment 2

•

23 years ago

So what you're saying is that we should dump the check for "#!" completely and
just look for null chars like we do later in the code?

rpotts (gone)

Comment 3

•

23 years ago

The code in question was simply moved from the original 'unknown content
sniffer' in the 4x codebase :-)

Personally, i think you are correct and we should just remove this check all
together...  if it turns out to be text/plain then great.  otherwise, we should
treat it as application/octet-stream and save it ...

-- rick

Boris Zbarsky [:bzbarsky]

Assignee

Comment 4

•

23 years ago

Attached patch Patch to fix — Details — Splinter Review

Actually, looking at the comment before that check I see what the issue is.  If
I have a per script that emits html, we want it to be text/plain, not
text/html....

This patch should fix things to be happy

Boris Zbarsky [:bzbarsky]

Assignee

Comment 5

•

23 years ago

bbaetz, would you review?

Status: NEW → ASSIGNED

Priority: -- → P3

Target Milestone: --- → mozilla0.9.7

Bradley Baetz (:bbaetz)

Comment 6

•

23 years ago

Why aren't we removing the #! check altogether?

Also, how many btyes could we be scanning?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 7

•

23 years ago

------------------------------------------
#! /usr/bin/perl

print "Content-type: text/html\n\n";
print "<html><body>This is some HTML</body></html>"
------------------------------------------

We want to detect this snippet as text/plain, not text/html

We would be scanning at most 1024 bytes (that's the buffer size).  Note that if
I were to remove this check completely we would first search for various html
tags (not to quick) and then _still_ scan the 1024 bytes if no html tags are
found.  So perf-wise removal and my change are probably equivalent.

tenthumbs

Reporter

Comment 8

•

23 years ago

I don't see anything wrong with a special test in "#!" handling as long 
as mBufferLen is always big enough. You need 128 bytes for Linux. 
However, it does seem inefficient.

I also think you want to check for more control characters than just 
nulls.

tenthumbs

Reporter

Comment 9

•

23 years ago

Make that you need to check more than 128 bytes on Linux if the first  
128 are all text.

Bradley Baetz (:bbaetz)

Comment 10

•

23 years ago

Comment on attachment 59329 [details] [diff] [review]
Patch to fix

Ah, ok. r=bbaetz

Attachment #59329 - Flags: review+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 11

•

23 years ago

> I also think you want to check for more control characters than just 
> nulls.

Not really... those control characters could just be normal text characters in
plain text that's not in the ASCII encoding.  Even checking for nulls is a hack
-- two-byte encodings will test as binary.

We want to flag as text as many text documents as we can, in my opinion.  If
this causes a few binary documents to be treated as text, this is an acceptable
alternative... the user can still save them after they've displayed.

tenthumbs

Reporter

Comment 12

•

23 years ago

> We want to flag as text as many text documents as we can, in my 
> opinion.

An awful lot of users freak if garbage appears on their screens. Even
more do not realize they can save a file that way. Besides, has mozilla
in the past, or does it now, or may it in the future, do newline
conversions on text files. If so, your binary data is screwed. I think
you're wrong about text files.

rpotts (gone)

Comment 13

•

23 years ago

Comment on attachment 59329 [details] [diff] [review]
Patch to fix

sr=rpotts@netscape.com

Attachment #59329 - Flags: superreview+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 14

•

23 years ago

tenthumbs, if we fix text file stuff we should fix it not only here but also
later in the mime decoder... Please file a separate bug on that?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 15

•

23 years ago

checked in

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Incorrect interpretation of "#!" in nsUnknownDecoder::DetermineContentType

Categories

(Core :: Networking, defect, P3)

Tracking

()

People

(Reporter: tenthumbs, Assigned: bzbarsky)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Attachment

General

Description

File Name

Content Type