Closed Bug 117842 Opened 23 years ago Closed 23 years ago

Plain text files without extension may appear as HTML

Categories

(Core :: Networking, defect)

x86
Windows NT
defect
Not set
normal

Tracking

()

VERIFIED WONTFIX

People

(Reporter: serhunt, Assigned: neeti)

Details

1. create a text file with notepad or similar 2. make sure you strip extesion off, so mime type will not be specified 3. open it in the browser and watch it displayed as plain text 4. add the following symbols to the file: <html (no need for the closing bracket) anywhere in the text and open the file again Result: the text is now displayed html-formatted "<html" string is not seen. Expected: plain text, just like in step #3, with "<html" string showed. Note that this "<html" string does not need to be any close to the top of the text, it can go anywhere, even to the very bottom of a large text. For comparison, IE also does this but not always -- only when "<html" string is found close to the beginning of the text, about several lines, no more.
Just couple of words on why would anybody care. We have a tester plugin which shows log output in the browser window as a plain text. Sometimes in the log, part a page html source is present. In such a case the whole log output gets screwed. I temporarily changed the tester plugin so that it replaces any occurence of "<html" with "<@tml" in the output buffer before sending it to the browser.
Unfortunately, there is no way to tell it's a text file a priori. Since there is no extension to go on we sniff the content... And <html triggers detection as text/html (see http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/nsUnknownD ecoder.cpp#335). Also, we look in the first 1024 bytes (http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/nsUnknown Decoder.cpp#52). Over to networking and ccing rpotts. Would it make sense to only sniff 128 bytes? (or 256 bytes?)
Assignee: harishd → neeti
Component: Parser → Networking
QA Contact: moied → benc
It probably would. I don't usually see much code on hmtl pages before <html> tag. 128 looks like a good euristic choice.
I would rather have text appear as html. The other way around would be painful for most users. I don't think that we can fix this. Given no mime type or file extension, it is probably going to be opened via html. boris, file a bug - maybe we can optimize further.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
VERIFIED: possible future testcase.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.