Right now, we sniff all loads in the content area, subframes, or <object> for possibly being feeds.
This seems wrong to me. I see no reason to sniff an image/png or video/whatever or application/x-msword file to see whether it might be a feed. What we do now is bad for the web in general, not to mention a possible performance drag.
Realistically, this code should only sniff types that are likely to be feeds. A few of the relevant XML types, text/plain, the "maybe text" type, and text/html should cover it, I suspect.
This also happens to violate HTML5, and is causing real-life interoperability issues...
So are you saying that you want this bug to implement HTML5 sniffing, which means only sniffing for the results of incompetent PHP programmers who send text/html while making it impossible to subscribe to static foo.rss or foo.atom if served by an Apache server with a mime.types file from before August 2007, or to subscribe to foo.xml at all?
HTML5 sniffing specifically allows you to sniff the Apache text/plain brokenness, so I have no idea what you're talking about with "impossible ... if served by an Apache server".
I think what we should be doing is what I said in comment 0: text/html, XML types, and the HTML5-allowed sniffing.
Yes, HTML5 sniffing allows you to sniff Apache text/plain: as text/plain, or as application/octet-stream. Nothing else. There is only one possible entry into feed sniffing in the HTML5 algo, "text/html". No sniffing of text/xml or application/xml or any type ending +xml, including RSS 1.0 served as application/rdf+xml, is allowed.
(In reply to comment #4)
> No sniffing of text/xml or
> application/xml or any type ending +xml, including RSS 1.0 served as
> application/rdf+xml, is allowed.
If that is correct, I don't see how we can follow HTML5's current text. Do any browsers behave that way?
I don't see how HTML5 forbids "sniffing" XML for feeds. (I don't think that's considered sniffing.)
(In reply to comment #5)
> Do any browsers behave that way?
Of course not.
(In reply to comment #6)
> I don't see how HTML5 forbids "sniffing" XML for feeds. (I don't think that's
> considered sniffing.)
Um, what? Please, please, please *read* http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#content-type0 paying attention to parts like 220.127.116.11 where you are told that text/plain may only remain text/plain or become application/octet-stream, and 18.104.22.168 where you are told unambiguously that if you are sniffing and the type is any XML type, you must stop instantly, and then 22.214.171.124 where you are told if and only if you have made it that far without being stopped and the type is text/html, you may then proceed to the lovingly detailed CONTENT-TYPE SNIFFING: feed or HTML section, and then explain to me your theory that sniffing content served under another type and treating it as though it was served with a feed type is not sniffing. I look forward to hearing wtf that section about sniffing feeds is for, if sniffing feeds is not sniffing feeds. While you are doing that, I'll dig up Hixie's mails that clearly and unambiguously say that that's exactly what he intends to say, that nothing but application/rss+xml, application/atom+xml, and text/html may be treated as being feeds when loaded as content, rather than from autodiscovery. Whether or not at some future date HTML5 will still say that, that *is* what it says today.
That section of HTML5 used to say something much more reasonable, and it sounds like it should be changed to say that more reasonable thing again. Please post to public-html accordingly?
The original point of comment 0 stands, in any case.
The file type is XML. What happens during XML page loading is defined here: http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#read-xml
HTML5 does indeed not allow sniffing in text/plain. I did not contest that.
(I'm not sure what you heard from Hixie and where. It does not seem to be accurate as far as I can tell. Also, reading HTML5 is part of my job...)
Ah, my apologies, I lacked sufficient imagination to expect that there would be sniffing and "sniffing" and that "sniffing" wouldn't be called sniffing.
Would be good to get, but not going to block on it.
Created attachment 335064 [details] [diff] [review]
Let's try this
Comment on attachment 335064 [details] [diff] [review]
Let's try this
>+ // Don't sniff arbitrary types. Limit sniffing to situations that
>+ // we think can reasonably arise.
>+ if (!contentType.EqualsLiteral(TEXT_PLAIN) &&
Let's take this out, per HTML5. We can put it back if it turns out not to be web compatible.
>+ !contentType.EqualsLiteral(APPLICATION_GUESS_FROM_EXT) &&
>+ !contentType.EqualsLiteral(TEXT_HTML) &&
>+ !contentType.EqualsLiteral(APPLICATION_OCTET_STREAM) &&
>+ // Same criterion as XMLHttpRequest. Should we be checking for "+xml"
>+ // and check for text/xml and application/xml by hand instead?
>+ contentType.Find("xml") == -1)
We can do better here, let's file a follow-up.
I'll take out the APPLICATION_GUESS_FROM_EXT too, then, since that's our "text/plain that looks like binary" thing.
Pushed changeset 45d7857e2e0e with that change.
Verified that it has been fixed on build
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1pre) Gecko/20090530 Shiretoko/3.5pre (.NET CLR 3.5.30729)