"Parse MP4 metadata failed" error when trying to play certain QTFF/MP4 files
Categories
(Core :: Audio/Video, defect, P3)
Tracking
()
People
(Reporter: Chris.Paucar, Assigned: jbauman)
Details
Attachments
(3 files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36
Steps to reproduce:
We are using LIVE555 software to create QTFF/MP4 video files from RTSP cameras. We configured it for MP4 output. Chrome plays all videos from all cameras. Firefox has a parsing issue with videos from certain cameras, but not others. Firefox had the same results in both Windows and Linux.
Link to "failed" video: https://rocdevdata.blob.core.windows.net/firefox/B0C55431BD50-evtid-5EBEE83A-5EBEE858.mp4
Link to working video: https://rocdevdata.blob.core.windows.net/firefox/Pablo.mp4
Open the links in Firefox. Also attached the "failed" video just in case.
Actual results:
The error "No video with supported format and MIME type found" appears on the page.
The console logs show "Media resource https://rocdevdata.blob.core.windows.net/firefox/B0C55431BD50-evtid-5EBEE83A-5EBEE858.mp4 could not be decoded."
as well as "Media resource https://rocdevdata.blob.core.windows.net/firefox/B0C55431BD50-evtid-5EBEE83A-5EBEE858.mp4 could not be decoded, error: Error Code: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006)
Details: virtual RefPtr<MP4Demuxer::InitPromise> __cdecl mozilla::MP4Demuxer::Init(void): Parse MP4 metadata failed"
Expected results:
Video from the file should begin playing, such as in the working video link. Audio does not matter since Firefox does not support all the audio codecs our cameras use, so we expect some videos to be silent or missing audio.
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0
Hi,
I have managed to reproduce this issue on Release version 76.0.1, Beta 78.0b1 and latest Nightly 79.0a1 (2020-05-01) using Windows 10.
Further, I will move this over to a component so developers can take a look over it. If this is not the correct component please feel free to change it to an appropriate one.
Thanks for your report.
Assignee | ||
Updated•5 years ago
|
Reporter | ||
Comment 2•5 years ago
|
||
Please let me know if any additional information or action is needed from me, thanks.
Assignee | ||
Comment 3•5 years ago
|
||
Thanks for your patience. I can reproduce this, so I don't think there's anything else I need from you. I see that the failure is occurring in parsing the descriptor of the ESDS box here: https://github.com/mozilla/mp4parse-rust/blob/309b9b8acdafa1fc34bb6e3df5b27979b272bf42/mp4parse/src/lib.rs#L2423, but I need to do a bit more digging to determine if this is an error in the parser or an invalid encoding in the file.
Assignee | ||
Comment 4•5 years ago
|
||
After taking some time to analyze the code and the attached video which failed to parse, I believe there is an error in the encoding of the file, specifically the esds
box. Here's a hex dump I've generated with hexdump -C -s 0x9bba0 -n 112 failed.mp4
to show the relevant portions:
0009bba0 00 00 00 67 73 74 73 64 00 00 00 00 00 00 00 01 |...gstsd........|
0009bbb0 00 00 00 57 6d 70 34 61 00 00 00 00 00 00 00 01 |...Wmp4a........|
0009bbc0 00 00 00 00 00 00 00 00 00 02 00 10 ff fe 00 00 |................|
0009bbd0 1f 40 00 00 00 00 00 33 65 73 64 73 00 00 00 00 |.@.....3esds....|
0009bbe0 03 80 80 80 2a 00 00 00 04 80 80 80 1c 40 15 00 |....*........@..|
0009bbf0 18 00 00 00 6d 60 00 00 6d 60 05 80 80 80 02 15 |....m`..m`......|
0009bc00 90 06 80 80 80 01 02 00 00 00 18 73 74 74 73 00 |...........stts.|
I'm not sure how familiar you are with the structure of MP4 files (AKA ISOBMFF), so let me know if anything I'm saying doesn't make sense and I can go into more detail.
The stsd
box starts at offset 0x0009bba0
in the file with 4 bytes giving the size: 00 00 00 67
(103 bytes in decimal) followed by 4 bytes for the name of the box 73 74 73 64
("stsd" in ASCII). Adding 0x0009bba0
to 0x67
gives 0x0009bc07
, which is where the stts
box starts (00 00 00 18
4 bytes for the length, followed by 73 74 74 73
"stts"). This tells us that the stsd
box includes everything between the half-open offset range 0x0009bba0
..0x0009bc07
.
The structure of the stsd
box is § 8.5.2 "Sample Description Box" of ISO/IEC 14496-12:2015, which is publicly available free of charge. Interpreting the contents of that box happens in the read_stsd
function. This tells us that there's 1 sample, which follows in the form of a AudioSampleEntry
(see § 12.2.3). This is parsed by read_audio_sample_entry
and we can see the codingname
(just name
in the code) is mp4a
. Looking at the beginning of that box at offset 0x0009bbb0
we see again a 4-byte length 00 00 00 57
(87 bytes in decimal) followed by 6d 70 34 61
("mp4a" in ASCII). Adding 0x0009bbb0
to 0x57
again gives 0x0009bc07
(the start of the stts
box), so we know the mp4a
box continues to the end of the stsd
box which contains it.
After some information about the channels and sample characteristics, the mp4a
box contains an esds
box, which unfortunately doesn't seem to be described in any freely available spec. Most of the information comes from ISO/IEC 14496-1:2010, but you can see from the read_esds
code that it's a container for an ES_Descriptor
defined in § 7.2.6.5. read_esds
reads the remainder of the box (again, at offset 0x0009bbd4
we have 00 00 00 33
51 bytes long, 65 73 64 73
"esds", so the esds
box goes all the way to the end: 0x0009bc07
). This array, starting at offset 0x0009bbe0
and continuing until (but not including 0x0009bc07
) is passed to the find_descriptor
function.
According to the spec § 7.2.2.2, BaseDescriptor
starts with an 1-byte tag. We see 0x03
at offset 0x0009bbe0
, which corresponds to the expected ES_DescrTag
. Following that is a variable-length size field described in § 8.3.3 "Expandable classes". You can see how this works from the code of find_descriptor
, but basically for each byte if the high bit is set, it indicates that the size continues with the subsequent byte and the low 7-bits are concatenated together to form the value. If the high bit is not set, the low 7 bits are still concatenated, but the process stops. In this file, we have (starting at offset 0x0009bbe1
after the 0x3
tag) 80 80 80 2a
or:
binary decimal
--------------------
0b1000_0000 128
0b1000_0000 128
0b1000_0000 128
0b0010_1010 42
The high bit is set on the first 3 bytes, so we continue accumulating the size, but the low 7 bits are all 0, so we end up with 0000_0000_0000_0000_0000_0010_1010
or 42. Per the specification:
The size information shall not include the number of bytes needed for the size and the object_id encoding.
So this size indicates that the ES_Descriptor
should continue for 42 bytes starting after the size, meaning from offset 0x0009bbe5
. However, adding 42 gives offset 0x0009bc0f
, and we know that the esds
(and the mp4a
and stsd
boxes which contain it) end prior to offset 0x0009bc07
. So, I think the encoding here is incorrect.
What exactly it should be is a little involved. This length is for an ES_Descriptor
box, which contains a DecoderConfigDescriptor
. We can see that starting at offset 0x0009bbe8
since we know it starts with tag 0x04
and has a similar pattern for the length: 80 80 80 1c
. However, that length appears to have a similar problem, it converts to a value of 28, when added to the offset 0x0009bbed
(immediately following the length itself) we get 0x0009bc09
, which is beyond the end of the esds
box, but it's not immediately clear what the length should be. Continuing on, the specification indicates a DecoderConfigDescriptor
contains 0 or 1 DecoderSpecificInfo
boxes and a variable number of ProfileLevelIndicationIndexDescriptor
boxes. We can see the DecSpecificInfoTag
(0x05
) at offset 0x0009bbfa
followed by 80 80 80 02
indicating a length of 2 bytes. If there were any ProfileLevelIndicationIndexDescriptor
boxes after that, we'd expect to see the ProfileLevelIndicationIndexDescrTag (0x14
) at offset 0x0009bc01
, but instead we see 06 80 80 80 01
indicating a SLConfigDescrTag
and a length of 1. Since an ES_Descriptor
box does contain a SLConfigDescriptor
following the DecoderConfigDescriptor
This allows us to conclude that the ES_Descriptor
box includes the DecoderConfigDescriptor
box, which includes the DecoderSpecificInfo
box, at which point those two boxes end and the SLConfigDescriptor
occurs as part of the ES_Descriptor
, continuing right up to the beginning of the stts
box at 0x0009bc07
. Or, graphically
offset box
-------------------------------------
0009bbd4 esds
0009bbe0 ES_Descriptor
0009bbe8 DecoderConfigDescriptor
0009bbfa DecoderSpecificInfo
0009bc01 SLConfigDescriptor
0009bc07 stts
Knowing that, and the fact that the descriptor box lengths don't include the bytes for the tag or the length itself, the lengths of the first two boxes should be changed accordingly
offset old-value new-value
-------------------------------
0009bbe4 0x2a 0x22
0009bbec 0x1c 0x14
Finally, we can check this against the mp4dump
utility. With the original file as input, we get a crash:
[esds] size=12+39
[ESDescriptor] size=5+42
es_id = 0
stream_priority = 0
[DecoderConfig] size=5+28
stream_type = 5
object_type = 64
up_stream = 0
buffer_size = 6144
max_bitrate = 28000
avg_bitrate = 28000
DecoderSpecificInfo = 15 90
[Descriptor:06] size=5+1
Segmentation fault: 11
and with the updated file (attached):
[esds] size=12+39
[ESDescriptor] size=5+34
es_id = 0
stream_priority = 0
[DecoderConfig] size=5+20
stream_type = 5
object_type = 64
up_stream = 0
buffer_size = 6144
max_bitrate = 28000
avg_bitrate = 28000
DecoderSpecificInfo = 15 90
[Descriptor:06] size=5+1
So, all that is enough to get the file to pass the parser, but there still other issues with the audio that are causing errors when trying to decode them. Removing the audio track works, but I'd need to spend some more time digging to see what's going on.
Reporter | ||
Comment 5•5 years ago
|
||
Thank you for the detailed write up. I can confirm the parsing error is no longer showing up with the fixed MP4 file though the audio remains missing as you stated. I will debug the LIVE555 library that created the ESDS atom/box to see how this could have occurred.
For the video in comment 4 I get audio in Fx under Linux, but not under Windows.
If I take the original file and ask ffmpeg to remux it then it works for me in Fx everywhere.
ffmpeg.exe -i B0C55431BD50-evtid-5EBEE83A-5EBEE858.mp4 -c copy re-mux2.mp4
ffmpeg version git-2020-03-11-36aaee2 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 9.2.1 (GCC) 20200122
configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-ffnvcodec --enable-cuda-llvm --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf
libavutil 56. 42.100 / 56. 42.100
libavcodec 58. 75.100 / 58. 75.100
libavformat 58. 41.100 / 58. 41.100
libavdevice 58. 9.103 / 58. 9.103
libavfilter 7. 77.100 / 7. 77.100
libswscale 5. 6.100 / 5. 6.100
libswresample 3. 6.100 / 3. 6.100
libpostproc 55. 6.100 / 55. 6.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'B0C55431BD50-evtid-5EBEE83A-5EBEE858.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: mp42isom
creation_time : 2020-05-15T19:06:34.000000Z
Duration: 00:00:30.34, start: 0.000000, bitrate: 170 kb/s
Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 8000 Hz, stereo, fltp, 26 kb/s (default)
Metadata:
creation_time : 2020-05-15T19:06:34.000000Z
handler_name : ?Apple Sound Media Handler
Stream #0:1(eng): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc, bt470bg/bt470bg/smpte170m), 1280x720, 141 kb/s, 15 fps, 15 tbr, 600 tbn, 30 tbc (default)
Metadata:
creation_time : 2020-05-15T19:06:34.000000Z
handler_name : ?Apple Video Media Handler
encoder : H.264
Output #0, mp4, to 're-mux2.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: mp42isom
encoder : Lavf58.41.100
Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc, bt470bg/bt470bg/smpte170m), 1280x720, q=2-31, 141 kb/s, 15 fps, 15 tbr, 19200 tbn, 600 tbc (default)
Metadata:
creation_time : 2020-05-15T19:06:34.000000Z
handler_name : ?Apple Video Media Handler
encoder : H.264
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 8000 Hz, stereo, fltp, 26 kb/s (default)
Metadata:
creation_time : 2020-05-15T19:06:34.000000Z
handler_name : ?Apple Sound Media Handler
Stream mapping:
Stream #0:1 -> #0:0 (copy)
Stream #0:0 -> #0:1 (copy)
Press [q] to stop, [?] for help
frame= 453 fps=0.0 q=-1.0 Lsize= 627kB time=00:00:30.20 bitrate= 170.1kbits/s speed=7.58e+03x
video:523kB audio:99kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.781984%
You can see it changes the order of the streams, but it shouldn't be modifying them. Assuming this is true, then that the file now works suggests there is an issue in the metadata. Comparing the output of the remux to comment 4's fixed file it appears that the data in the esds box is different. I don't know for sure, but it's suspect this metadata is bad and when we give it to decoders on Windows and MacOS those decoders become incorrectly configured and fail to decode. On Linux if ffmpeg is present we will use it, and I suspect it will behave more robustly (as when asking it to remux). This also explains why the file works in VLC and Chrome.
So it seems like a muxing issue to me.
Assignee | ||
Comment 7•5 years ago
|
||
I've also tried remuxing with ffmpeg (attached) on macOS and confirm it creates a file which plays correctly (with sound) on Firefox, Chrome and QuickTime Player.
Unfortunately, dumping the structure of the two files shows a lot of differences, so there's not an obvious thing to point to, but I feel fairly confident in saying that there are some additional errors in the way the original file was muxed.
I'm going to close this since I don't think Firefox's behavior here is wrong (even if some other tools are more lenient about processing files with errors). If you discover something contrary to that, feel free to comment and we can look at addressing it.
Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Description
•