Last Comment Bug 394416 - Feed sniffing should apply to fewer mime types
: Feed sniffing should apply to fewer mime types
Status: VERIFIED FIXED
: html5, perf
Product: Firefox
Classification: Client Software
Component: RSS Discovery and Preview (show other bugs)
: Trunk
: All All
: P5 normal with 1 vote (vote)
: Firefox 3.1b1
Assigned To: Boris Zbarsky [:bz]
:
Mentors:
Depends on:
Blocks: 360705
  Show dependency treegraph
 
Reported: 2007-08-30 23:30 PDT by Boris Zbarsky [:bz]
Modified: 2009-06-01 08:56 PDT (History)
13 users (show)
mconnor: blocking‑firefox3-
mconnor: blocking‑firefox3.5-
mconnor: wanted‑firefox3+
mconnor: wanted‑firefox3.5+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Let's try this (1.93 KB, patch)
2008-08-22 10:00 PDT, Boris Zbarsky [:bz]
sayrer: review+
Details | Diff | Review

Description Boris Zbarsky [:bz] 2007-08-30 23:30:49 PDT
Right now, we sniff all loads in the content area, subframes, or <object> for possibly being feeds.

This seems wrong to me.  I see no reason to sniff an image/png or video/whatever or application/x-msword file to see whether it might be a feed.  What we do now is bad for the web in general, not to mention a possible performance drag.

Realistically, this code should only sniff types that are likely to be feeds.  A few of the relevant XML types, text/plain, the "maybe text" type, and text/html should cover it, I suspect.
Comment 1 Boris Zbarsky [:bz] 2008-07-05 10:33:10 PDT
This also happens to violate HTML5, and is causing real-life interoperability issues...
Comment 2 Phil Ringnalda (:philor) 2008-07-05 12:51:55 PDT
So are you saying that you want this bug to implement HTML5 sniffing, which means only sniffing for the results of incompetent PHP programmers who send text/html while making it impossible to subscribe to static foo.rss or foo.atom if served by an Apache server with a mime.types file from before August 2007, or to subscribe to foo.xml at all?
Comment 3 Boris Zbarsky [:bz] 2008-07-05 13:25:30 PDT
HTML5 sniffing specifically allows you to sniff the Apache text/plain brokenness, so I have no idea what you're talking about with "impossible ... if served by an Apache server".

I think what we should be doing is what I said in comment 0: text/html, XML types, and the HTML5-allowed sniffing.
Comment 4 Phil Ringnalda (:philor) 2008-07-05 14:36:13 PDT
Yes, HTML5 sniffing allows you to sniff Apache text/plain: as text/plain, or as application/octet-stream. Nothing else. There is only one possible entry into feed sniffing in the HTML5 algo, "text/html". No sniffing of text/xml or application/xml or any type ending +xml, including RSS 1.0 served as application/rdf+xml, is allowed.
Comment 5 Robert Sayre 2008-07-05 16:45:22 PDT
(In reply to comment #4)
> No sniffing of text/xml or
> application/xml or any type ending +xml, including RSS 1.0 served as
> application/rdf+xml, is allowed.

If that is correct, I don't see how we can follow HTML5's current text. Do any browsers behave that way?
Comment 6 Anne (:annevk) 2008-07-05 17:57:55 PDT
I don't see how HTML5 forbids "sniffing" XML for feeds. (I don't think that's considered sniffing.)
Comment 7 Phil Ringnalda (:philor) 2008-07-05 18:35:05 PDT
(In reply to comment #5)
> Do any browsers behave that way?

Of course not.

(In reply to comment #6)
> I don't see how HTML5 forbids "sniffing" XML for feeds. (I don't think that's
> considered sniffing.)

Um, what? Please, please, please *read* http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#content-type0 paying attention to parts like 2.7.2.3 where you are told that text/plain may only remain text/plain or become application/octet-stream, and 2.7.2.5 where you are told unambiguously that if you are sniffing and the type is any XML type, you must stop instantly, and then 2.7.2.7 where you are told if and only if you have made it that far without being stopped and the type is text/html, you may then proceed to the lovingly detailed CONTENT-TYPE SNIFFING: feed or HTML section, and then explain to me your theory that sniffing content served under another type and treating it as though it was served with a feed type is not sniffing. I look forward to hearing wtf that section about sniffing feeds is for, if sniffing feeds is not sniffing feeds. While you are doing that, I'll dig up Hixie's mails that clearly and unambiguously say that that's exactly what he intends to say, that nothing but application/rss+xml, application/atom+xml, and text/html may be treated as being feeds when loaded as content, rather than from autodiscovery. Whether or not at some future date HTML5 will still say that, that *is* what it says today.
Comment 8 Boris Zbarsky [:bz] 2008-07-05 19:18:57 PDT
That section of HTML5 used to say something much more reasonable, and it sounds like it should be changed to say that more reasonable thing again.  Please post to public-html accordingly?

The original point of comment 0 stands, in any case.
Comment 9 Anne (:annevk) 2008-07-05 19:28:56 PDT
The file type is XML. What happens during XML page loading is defined here: http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#read-xml

HTML5 does indeed not allow sniffing in text/plain. I did not contest that.
Comment 10 Anne (:annevk) 2008-07-05 19:31:21 PDT
(I'm not sure what you heard from Hixie and where. It does not seem to be accurate as far as I can tell. Also, reading HTML5 is part of my job...)
Comment 11 Phil Ringnalda (:philor) 2008-07-05 19:45:26 PDT
Ah, my apologies, I lacked sufficient imagination to expect that there would be sniffing and "sniffing" and that "sniffing" wouldn't be called sniffing.
Comment 12 Mike Connor [:mconnor] 2008-08-19 11:49:18 PDT
Would be good to get, but not going to block on it.
Comment 13 Boris Zbarsky [:bz] 2008-08-22 10:00:25 PDT
Created attachment 335064 [details] [diff] [review]
Let's try this
Comment 14 Robert Sayre 2008-09-04 15:07:28 PDT
Comment on attachment 335064 [details] [diff] [review]
Let's try this


>+  // Don't sniff arbitrary types.  Limit sniffing to situations that
>+  // we think can reasonably arise.
>+  if (!contentType.EqualsLiteral(TEXT_PLAIN) &&

Let's take this out, per HTML5. We can put it back if it turns out not to be web compatible.

>+      !contentType.EqualsLiteral(APPLICATION_GUESS_FROM_EXT) &&
>+      !contentType.EqualsLiteral(TEXT_HTML) &&
>+      !contentType.EqualsLiteral(APPLICATION_OCTET_STREAM) &&
>+      // Same criterion as XMLHttpRequest.  Should we be checking for "+xml"
>+      // and check for text/xml and application/xml by hand instead?
>+      contentType.Find("xml") == -1)

We can do better here, let's file a follow-up.
Comment 15 Boris Zbarsky [:bz] 2008-09-04 16:22:00 PDT
I'll take out the APPLICATION_GUESS_FROM_EXT too, then, since that's our "text/plain that looks like binary" thing.
Comment 16 Boris Zbarsky [:bz] 2008-09-05 10:40:37 PDT
Pushed changeset 45d7857e2e0e with that change.
Comment 17 Tanner M. Young [:tmyoung] 2009-06-01 08:56:20 PDT
Verified that it has been fixed on build

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1pre) Gecko/20090530 Shiretoko/3.5pre (.NET CLR 3.5.30729)

Note You need to log in before you can comment on or make changes to this bug.