Closed Bug 789123 Opened 12 years ago Closed 11 years ago

Find a way to sniff MP3 without ID3 header reliably

Categories

(Core :: Audio/Video, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 862088

People

(Reporter: padenot, Assigned: padenot)

References

(Blocks 1 open bug)

Details

Our old technique to sniff mp3 files had too much false positive (for example, UTF-16 byte order mark, which is kind of embarrassing).

We need to come up with a better way of determining a file is mp3.
Copying analysis from bug 789077:

We cannot do the normal thing and look for the next mp3 header at the expected offset because mp3 frames can be larger than 512 bytes.

While mp3 files generally have invalid utf-16 sequences, these are also not guaranteed to occur in the first 512 bytes.

It may be possible to find some structure in the mp3 frame data which isn't a valid html-ish document, or vice versa.
"The User Agent MAY wait for 512 or more octets to arrive for the same reason as in the "text or binary" section above..."

^^
That doesn't sound like a hard limit.
I read the MAY the other way, in light of step three.

    If at any point this algorithm requires the user agent to
    determine the value of a octet in s which has not yet arrived,
    or which is past the first 512 octets, or which is beyond the
    end of the octet stream, the algorithm stops and the sniffed-
    type is "text/html".

That is, the agent MAY choose to examing up the maximum of 512 octets, out of a possibly greater number of received bytes, but can terminate earlier.

This is only the "Feed or HTML" but the other sniff algorithms are also specified to terminate after examining 512 octets.

Not that we have to follow the spec, but we should try to if it's possible.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.