Closed Bug 789077 Opened 9 years ago Closed 9 years ago

Since Bug 567077 landed, .xhtml files encoded as UTF-16 Little Endian are not displayed, but instead offered for download as MP3 files.

Categories

(Core :: Audio/Video, defect)

18 Branch
x86_64
Windows 8
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla18
Tracking Status
firefox18 - ---

People

(Reporter: xeno42alpha, Assigned: padenot)

References

()

Details

(Keywords: regression)

Attachments

(4 files, 2 obsolete files)

Steps to reproduce:
Open a .xhtml file.

Actual results:
The file gets offered for download as a MP3 file.

Expected results:
The .xhtml file is displayed as a webpage.
------------------------------------------------------------
Summary: 
After I updated Nightly to 18.0a1 (2012-09-06), I tried to open a local .xhtml file I was working on. Instead getting displayed as a web page, it was offered for download as an MP3 file. 
Trying to open http://www.rialgar.de/ got the same result.
------------------------------------------------------------
Additional Information:
Response from Server:
HTTP/1.1 200 OK
Date: Thu, 06 Sep 2012 15:41:30 GMT
Server: Apache
Last-Modified: Sat, 25 Aug 2012 14:33:24 GMT
ETag: "1c7348cb-1a40-4c817f995f500"
Accept-Ranges: bytes
Content-Length: 6720
X-Powered-By: PleskLin
Keep-Alive: timeout=15, max=98
Connection: Keep-Alive
Content-Type: application/xhtml+xml
Blocks: 567077
Component: Networking → Video/Audio
Keywords: regression
Status: UNCONFIRMED → NEW
Ever confirmed: true
I can reproduce, but only on certain web pages. Most of them are not affected. I'll investigate right now.
Status: NEW → ASSIGNED
Assignee: nobody → paul
Note that this occurs even when the site sends a mimetype:

$ curl --head http://rialgar.de/
...
Content-Type: application/xhtml+xml
rialgar.de uses a UTF-16 encoding, starting with the 0xfffe byte-order marker, which looks like it matches your 0xfffa,0xfffb mask pattern. You may need to look for the next header at the expected offset to reliably identify mp3.
So my idea doesn't work, because draft-ietf-websec-mime-sniff says the sniffer stops after 512 bytes, and an mp3 frame can be longer than that.

I checked 8 different mp3 files with 'iconv --from-code=UTF-16 --to-code=UTF-8 file.mp3 > /dev/null' and while all contained an invalid UTF-16 sequence, for several files it with more than 512 bytes in, so that doesn't work either.

Looks like mp3 sniffing will require some format research.
This patch removes the pattern for an mp3 file without ID3 tag, so we don't
break the web. Bug 789123 has been filed to add a decent mechanism to sniff for
and mp3 file without ID3 tags.

This patch also removes the test. It does not remove the file, since I have good
hope to be able to re-enable the test in the near future.
Attachment #658927 - Flags: review?(cpearce)
But why are we sniffing at all if the server sent "Content-Type: application/xhtml+xml"?
bz, as far as my research went, [1] allows the overriding of the doctype using sniffers. I _guess_ this is the code path that is used when loading a page.

[1]: http://mxr.mozilla.org/mozilla-central/source/docshell/base/nsDocShell.cpp#9430
Yes, but the media sniffer is not supposed to do overriding, I would think.  It _can_ given how it hooks in, but it _shouldn't_.
Here is a patch. As discussed on IRC, we return early if the type is not empty,
or if it is not application/octet-stream.
Attachment #658962 - Flags: review?(bzbarsky)
Comment on attachment 658962 [details] [diff] [review]
Sniff for a media only if the Content-Type is unknown, or octet-stream. r=

As written, this will return if the type is nonempty.  As in, it'll break octet-stream sniffing (which I hope we have tests for!)

I think you want something more like this:

  if (!contentType.IsEmpty() &&
      !contentType.EqualsLiteral(APPLICATION_OCTET_STREAM) &&
      !contentType.EqualsLiteral(UNKNOWN_CONTENT_TYPE)) {
    return NS_ERROR_NOT_AVAILABLE;
  }

r=me with that.
Attachment #658962 - Flags: review?(bzbarsky) → review+
Comment on attachment 658927 [details] [diff] [review]
Remove sniffing for mp3 without ID3, because of false positive. r=

Review of attachment 658927 [details] [diff] [review]:
-----------------------------------------------------------------

Good. Can you make Boris's changes, upload the patch, and push to Try? Let's get this landed ASAP.
Attachment #658927 - Flags: review?(cpearce) → review+
I ran some test with local files, results as following:
– UTF-8 without Byte Order Mark: works
– UTF-8 with Byte Order Mark: works
– UTF-16 Big Endian: works
– UTF-16 Little Endian: fails
Both patches (with bz's comment addressed) pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=ac22cdf7b347
(In reply to xeno42alpha from comment #14)
> I ran some test with local files, results as following:
> – UTF-8 without Byte Order Mark: works
> – UTF-8 with Byte Order Mark: works
> – UTF-16 Big Endian: works
> – UTF-16 Little Endian: fails

Thanks for testing, that matches my results as well.
Summary: Since Bug 567077 landed, .xhtml files are not displayed, but instead offered for download as MP3 files. → Since Bug 567077 landed, .xhtml files encoded as UTF-16 Little Endian are not displayed, but instead offered for download as MP3 files.
I have never seen a MP3 file with FFFE, it's always FFFA or more commonly FFFB. The only completely safe way to detect MP3 would be to find the first frame using the frame sync, read the header and compute the frame length and then search for the next frame at the specific offset, if it's there, it's MP3, if not, it isn't.
We investigated doing something like comment 17 describes, but unfortunately, the spec requires us to read at most 512 bytes, and we thought that an mp3 frame could be longer than that. Can someone confirm or infirm that?

We should discuss this in bug 789123, which is about implementing mp3 sniffing properly.
Attachment #658927 - Attachment is obsolete: true
Attachment #658962 - Attachment is obsolete: true
Keywords: checkin-needed
(In reply to Paul ADENOT (:padenot) from comment #19)
> Created attachment 659392 [details] [diff] [review]
> This is green on try: https://tbpl.mozilla.org/?tree=Try&rev=ac22cdf7b347

Pushing directly to m-c for tomorrow's nightly.

https://hg.mozilla.org/mozilla-central/rev/882b5dcaed46
https://hg.mozilla.org/mozilla-central/rev/530da2e30cef

Should this have tests?
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Flags: in-testsuite?
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
Ryan, this is a bug fix for a feature that already have tests.

On the other hands our test suite did check the behavior we want, i.e. that the media sniffer sniff only when we want it to sniff. I'll see to write a test for that part, I suppose we can do it with an xpcshell test.
I figure that when things regress, there's a hole in our test coverage :)
Depends on: 789741
Tracking bug 567077 in case there are more regressions we need to have awareness about but not tracking here since this is resolved.
You need to log in before you can comment on or make changes to this bug.