Closed
Bug 942078
Opened 11 years ago
Closed 3 years ago
Video thumbnail generation rule
Categories
(Core :: Audio/Video: Playback, defect, P2)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: bwu, Unassigned)
References
Details
(Whiteboard: [ucid:multimedia10, 1.5, ft:multimedia-platform])
Current video's thumbnail is generated by seeking to 1/4 video duration and getting that I-frame decoded. This frame may be not good since it could be black screen or some meaningless frame to users.
Another way is to decode the largest I-frame which more likely contains rich information or has high quality.
Comment 1•11 years ago
|
||
Hi Chris & David,
According to Gecko media & Gaia video app domain, may we know your opinion on this improvement of generating video thumbnail?
Thanks.
Flags: needinfo?(dflanagan)
Flags: needinfo?(chris.double)
Comment 2•11 years ago
|
||
(In reply to Marco Chen [:mchen] from comment #1)
>
> According to Gecko media & Gaia video app domain, may we know your opinion
> on this improvement of generating video thumbnail?
I think this is a good approach. A while back we looked at doing a thumbnail specific API in the media code and that was one of the approaches suggested (bug 873959).
Flags: needinfo?(chris.double)
Comment 3•11 years ago
|
||
The current approach comes from very early UX specs for the video app. The idea is that many professional videos begin with a blank frame, so it doesn't make sense to start at the first frame. IIRC, we don't do 1/4 of the duration. We do 10 seconds or 1/4, which ever is smaller.
But there is no real logic to it, we just hope that that is a better frame than the first one is.
If you want to do something better, that would be great. Just picking the largest frame seems like it might get you an image that was full of text or something else that is hard to compress. It seems like it would be better if you could do something like start at the beginning and look for a jump in frame size to find the first non-trivial frame.
But I'm completely out of my expertese here. If you can do something smart here, we'll certainly use it for the video app.
Needinfo John to bring this to his attention.
Flags: needinfo?(dflanagan) → needinfo?(johu)
Comment 4•11 years ago
|
||
Thanks for bringing this to me, David.
I don't think the frame at 1/4 will be totally useless. We uses canvas to draw the thumbnail and assume that the Video element helps to decode to the frame of specified time. That may or may not be a I-frame but at least a frame which is constructed from it's I-frame and applying some P/B-frames to have the correct image. From my knowledge of H.264, an I-frame contains full image/information and all frames after that I-frame are constructed from that I-frame. It shouldn't be a black or partial black image if we constructed a frame from I-frame and applying all frame changes after that I-frame to the specified time.
Another issue is every video may have different I-frame interval. Some of them may use constant time interval but some of them are not. And the creator of that video can change the strategy of creating I-frame.
I feel if we can find a frame whose color histogram is as a bell shape, that may be a good frame to be the video thumbnail.
Flags: needinfo?(johu)
Reporter | ||
Comment 5•11 years ago
|
||
Usually when seeking in video playback, it seeks to the nearest I-frame around the seeking time. The reason behind this is if it seeks to the frame with exact seeking time, that frame may not be a I-frame. If it is not I-frame, media extractor still needs to find the I-frame before it and decode I-frame with the following P/B frames until reaching the seeking time. This may take some time and user expericen may not be good.
The reason to pick up the largest size of I-frames is if the raw data of that frame is black or plain, the encoded size would be small. From my understanding of H.264, the more information it contains, the bigger size it has. So it is a pretty simple way to filter out some "bad" frames.
For checking color histogram, it is also a better way. However, it may need to take some time to decode and analyze each I-frame. Not sure how to do it in a quick way.
Comment 6•11 years ago
|
||
(In reply to Blake Wu from comment #5)
> If it is not
> I-frame, media extractor still needs to find the I-frame before it and
> decode I-frame with the following P/B frames until reaching the seeking
> time. This may take some time and user expericen may not be good.
This is what the HTML video element does. It seeks to the nearest key frame then decodes up to the requested seek frame. Implementing the faster, but less accurate, method of seeking to key frame only is bug 778077.
Comment 7•11 years ago
|
||
Wow! This's a pretty old bug. I never found that we are going to implement the fast seek. That may be very useful to deal this kind of . There is an extreme case for this kind of API. We may have a key frame per minute or more which causes the fast seek absolute out of sync. And I know that's the trade-off.
I think Blake is trying to fix two things at one shot: 1. fast seek and 2. the significant thumbnail for this video. For the first one, that is exactly the bug 778077. For the second one, that 's largest key-frame.
From the view of gaia, we don't have the API to do "fast seek". If we have, I feel that's good to use it to generate thumbnail. But I still wondering that what's the difference if we use hardware codec. That should be very fast and user may not be aware of the difference. That does provide a large improvement when we fallback to software codec.
(In reply to Chris Double (:doublec) from comment #6)
> This is what the HTML video element does. It seeks to the nearest key frame
> then decodes up to the requested seek frame. Implementing the faster, but
> less accurate, method of seeking to key frame only is bug 778077.
Comment 8•11 years ago
|
||
bug 873959 seems easy way to implement this. In android stagefright, mp4 file's thumbnail is determined biggest sample in 20 sync samples from front.
SampleTable::findThumbnailSample()
http://androidxref.com/4.4_r1/xref/frameworks/av/media/libstagefright/SampleTable.cpp#727
Comment 9•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #8)
> bug 873959 seems easy way to implement this. In android stagefright, mp4
> file's thumbnail is determined biggest sample in 20 sync samples from front.
>
> SampleTable::findThumbnailSample()
> http://androidxref.com/4.4_r1/xref/frameworks/av/media/libstagefright/
> SampleTable.cpp#727
Thumbnail time can be checked also by gaia side.
Comment 10•11 years ago
|
||
Hi Sotaro,
We don't have libstagefright in Gecko, see bug 778052. So, We can't have thumbnail time in metadata. We may only have it in ogg file format, since the landing of bug 763010.
Updated•11 years ago
|
Summary: Video thumbnail generation rule → [fugu]Video thumbnail generation rule
Reporter | ||
Updated•11 years ago
|
Assignee: nobody → bwu
Comment 11•11 years ago
|
||
Hi PeiPei,
May I know why you change the summary to include [fugu]?
This is a general improvement for all platform not for fugu only.
Thanks.
Flags: needinfo?(pcheng)
Comment 12•11 years ago
|
||
(In reply to Marco Chen [:mchen] from comment #11)
> Hi PeiPei,
>
> May I know why you change the summary to include [fugu]?
> This is a general improvement for all platform not for fugu only.
>
> Thanks.
This is to keep track of fugu must have issues. James, could you please clarify?
Flags: needinfo?(pcheng) → needinfo?(james.zhang)
Comment 13•11 years ago
|
||
All platform has this issue, not fugu only. You can remove fugu from summary.
Flags: needinfo?(james.zhang)
Comment 14•11 years ago
|
||
remove [fugu]
Summary: [fugu]Video thumbnail generation rule → Video thumbnail generation rule
Updated•11 years ago
|
Whiteboard: [ucid:Device14, 1.4:p2, ft:devices]
Reporter | ||
Comment 15•11 years ago
|
||
Hi All,
I list two possible designs for thumbnail generation in the below link. Please have a look and advice.
https://docs.google.com/file/d/0B98EFeZYRJgjdGQxUER4cG9IVTg/edit
The ideas of these two designs are
Design A
1. Get thumbnail time from loaded metadata but this is not ready and blocks on bug 778052
2. Seek to this time to get thumbnail
Design B
1. Introduce a new class, MediaDataRetriever, to get the decoded frame via media extractor and omx decoder (for example) without going through media decoders, media state machine, and media reader.
2. Introduce a new value,thumbnail, in preload.
3. Introduce a new Web API(mozSetThumbnail) or property(thumbnailTime) and a event(ongetthumbnail) to notify that thumbnail is ready.
API or property is optional which is used flexibly for APP to decide the thumbnail time.
3.1 mozGetThumbnail(double time)
Ex.
var video = document.createElement('video');
video.preload = 'thumbnail';//new
video.mozSetThumbnailTime(time); //new and optional
video.src = url;
video.ongetthumbnail = function() { //new
:
captureFrame(video, metadata,callback);
:
}
3.2 t
Ex.
var video = document.createElement('video');
video.preload = 'thumbnail';//new
video.thumbnailTime = time; //new and optional
video.src = url;
video.ongetthumbnail = function() { //new
:
captureFrame(video, metadata,callback);
:
}
For better flexibility and efficiency, Design B would be better. So currently I am working on 3.2 as above.
Should you have other ideas, please share with us.
Comment 16•11 years ago
|
||
From the view of gaia, I don't think those two designs have difference. But we may need to know which one has better performance..
Comment 17•11 years ago
|
||
(In reply to John Hu [:johnhu] from comment #16)
> From the view of gaia, I don't think those two designs have difference. But
> we may need to know which one has better performance..
The drawback of design A is pointed out in the link at comment 15.
I just help Blake to post it here.
1. Pros: The thumbnail time reported from metadata will be a I-Frame so we avoid to decode multiple frames for final frame.
2. Cons: The seek operation will perform into each video and audio codec and it wasted resources on seeking audio which we don't need it.
Updated•11 years ago
|
Blocks: 1.4-devices
Reporter | ||
Comment 18•11 years ago
|
||
Hi Chris and David,
May I have your opinions on these two designs?
Thanks!
Flags: needinfo?(dflanagan)
Flags: needinfo?(chris.double)
Comment 19•11 years ago
|
||
I'm not a fan of adding the 'thumbnail' option to 'preload'. What is the exact meaning of 'thumbnail' compared to the existing 'preload' attributes? Does thumbnail operate like 'auto', 'metadata' or 'none' after the thumbnail is loaded for example.
Flags: needinfo?(chris.double)
Reporter | ||
Comment 20•11 years ago
|
||
The original idea is 'thumbnail' could be like 'metadata' which requires some time to process.
For metadata,
gaia set preload = 'metadata' and get the metadata via onloadedmetadata.
So that is why to use the same way to do it.
set preload = 'thumbnail' and get it via ongetthumbnail.
Updated•11 years ago
|
Blocks: 1.4-devices-targeted
Updated•11 years ago
|
No longer blocks: 1.4-devices
Comment 21•11 years ago
|
||
(In reply to Blake Wu [:bwu] from comment #18)
> Hi Chris and David,
>
> May I have your opinions on these two designs?
>
> Thanks!
I feel that I only barely understand the techincal issues here. For example, I've never heard of an I-frame before.
However, here are my thoughts:
1) adding a new thumbnail attribute value, a new method or property and a new event handler to the video element is a lot of new standardization. I would not go down that path unless you've already initiated discussions on the standards mailing lists.
2) I agree with Chris that preload="thumbnail" is underspecified, and think it would be difficult to distinguish this case from preload="metadata".
3) In regular desktop Firefox, what is the default behavior going to be? If preload is not "none", will the video element automatically start loading the URL and find a thumbnail frame to display? If so, then it certainly makes sense to have an API that exposes either the thumbnail or the fact that the thumbnail is ready. If you are going to be fetching the thumbnail for display with preload set to "metadata" then you can just fire onthumbnailready when you have it and don't need a new value for preload.
4) Do some videos have embedded metadata that specifies a thumbnail time? Are you proposing to honor that metadata when specified, or to just allow the user to specify the time explicitly if they choose? I think it would be better to have a simple API that honors the metadata when possible and otherwise makes a best guess. If the user wants to specify a time explicitly, they can do that with currentTime and onseeked. They don't need a different property for thumbnail time.
5) Instead of requiring the developer to create a canvas and capture the thumbnail frame with copyImage(), have you considered just defining a getThumbnail() function? It could take a callback or could return a Promise. I've just learned about ImageBitmap, and I know that Shih-Chiang is working on implementing ImageBitmap as part of his work on 854795, so maybe you could return the thumbnail as an ImageBitmap (assuming that would be efficient). I don't feel strongly about that.
I think these various points are probably contradictory, and my advice is uninformed, so please don't take anything I say too seriously!
Flags: needinfo?(dflanagan)
Reporter | ||
Comment 22•11 years ago
|
||
Hi Chris,
Thanks for your great feedback. My comment is inline.
(In reply to David Flanagan [:djf] from comment #21)
> (In reply to Blake Wu [:bwu] from comment #18)
> > Hi Chris and David,
> >
> > May I have your opinions on these two designs?
> >
> > Thanks!
>
> I feel that I only barely understand the techincal issues here. For
> example, I've never heard of an I-frame before.
>
> However, here are my thoughts:
>
> 1) adding a new thumbnail attribute value, a new method or property and a
> new event handler to the video element is a lot of new standardization. I
> would not go down that path unless you've already initiated discussions on
> the standards mailing lists.
Got it. To have a efficient discussion in the standard mailing list, I would like to sync with you and Chris first.
> 2) I agree with Chris that preload="thumbnail" is underspecified, and think
> it would be difficult to distinguish this case from preload="metadata".
My original purpose is to distinguish meatadata from thumbnail. But it looks like not work well :)
More reasons are explained in my comment in 3).
> 3) In regular desktop Firefox, what is the default behavior going to be? If
> preload is not "none", will the video element automatically start loading
> the URL and find a thumbnail frame to display? If so, then it certainly
> makes sense to have an API that exposes either the thumbnail or the fact
> that the thumbnail is ready. If you are going to be fetching the thumbnail
> for display with preload set to "metadata" then you can just fire
> onthumbnailready when you have it and don't need a new value for preload.
When preload is none, nothing will be retrieved from video file including metadata.
When preoload is metadata, only metadata will be retrieved. No any video frame level will be processed. That's why I think a new attribute value may be needed. Besides if we put thumbnail fetch in "meatadata", that would create unnecessary actions/processes if user just want to get metadata only.
> 4) Do some videos have embedded metadata that specifies a thumbnail time?
Maybe but it is not common.
> Are you proposing to honor that metadata when specified, or to just allow
> the user to specify the time explicitly if they choose? I think it would be
> better to have a simple API that honors the metadata when possible and
> otherwise makes a best guess. If the user wants to specify a time
> explicitly, they can do that with currentTime and onseeked. They don't need
> a different property for thumbnail time.
Just allow the user to specify the time if they want.
Using currentTime will seek the unnecessary audio part, so that's why we don't prefer it.
> 5) Instead of requiring the developer to create a canvas and capture the
> thumbnail frame with copyImage(), have you considered just defining a
> getThumbnail() function? It could take a callback or could return a
> Promise. I've just learned about ImageBitmap, and I know that Shih-Chiang
> is working on implementing ImageBitmap as part of his work on 854795, so
> maybe you could return the thumbnail as an ImageBitmap (assuming that would
> be efficient). I don't feel strongly about that.
I have thought this as well, but don't see how ImageBitmap can benefit. I will discuss with Shin-Chiang further. Thanks for your information.
> I think these various points are probably contradictory, and my advice is
> uninformed, so please don't take anything I say too seriously!
Reporter | ||
Comment 23•11 years ago
|
||
Hi All,
With some considerations for future flexibility today, now I also think it will be better to have one dedicated api instead of new attribute value in preload :) I recap the information in Background, Design, Advantages, and Changes as below. May I have your opinions again?
Background:
A original reason to put thumbnail in the "preload" is to avoid metadata loading and thumbnail fetch running at the same time (call preload = "metadata" and mozGetThumbnail() ) which may cause codec resource conflict. However, for future flexibility and multi-core support we should let these two things be able to run in parallel. Currently, if gaia set those two at the same time, the later will be returned a error. In current gaia design, createThumbnail is in onloadedmetadata event, so there is no big impact for gaia part.
Design:
Introduce a new Web API, mozGetThumbnail (double thumbnailTime), and a event(ongetthumbnail) to notify that thumbnail is ready.
thumbnailTime is optional. If it is negative, a default thumbnail generation mechanism in the gecko will apply. Otherwise, an I-frame close to that time will be chosen as thumbnail.
Advantages:
For this new design,
1. Remove unnecessary audio seeking in current design.
2. Better to improve, like apply software codec and current mechanism in gecko is complicated and unnecessary, like media decoder state machine.
3. Easy and clear api to use for gaia developer.
Change:
For Gecko change,
Introduce a new class, MediaDataRetriever, to get the decoded frame via media extractor and omx decoder (for example) without going through media decoders, media state machine, and media reader.
For gaia change,
No much changes. Two lines in metadata.js are required to be changed, line#249 and line#264 as shown below.
249 //offscreenVideo.currentTime = Math.min(5, offscreenVideo.duration / 10);
offscreenVideo.mozGetThumbnail(time);
251 var failed = false; // Did seeking fail?
252 var timeout = setTimeout(fail, 10000); // Fail after 10 seconds
253 offscreenVideo.onerror = fail; // Or if we get an error event
254 function fail() { // Seeking failure case
255 console.warn('Seek failed while creating thumbnail for', videofile.name,
256 '. Ignoring corrupt file.');
257 failed = true;
258 clearTimeout(timeout);
259 offscreenVideo.onerror = null;
260 metadata.isVideo = false;
261 unload();
262 callback(metadata);
263 }
264 // offscreenVideo.onseeked = function() { // Seeking success case
offscreenVideo.ongetthumbnail = function() {
265 if (failed) // Avoid race condition: if we already failed, stop now
266 return;
267 clearTimeout(timeout);
268 captureFrame(offscreenVideo, metadata, function(poster) {
269 if (poster === null) {
270 // If something goes wrong in captureFrame, it probably means that
271 // this is not a valid video. In any case, if we don't have a
272 // thumbnail image we shouldn't try to display it to the user.
273 // XXX: See bug 869289: maybe we should not fail here.
274 fail();
275 }
276 else {
277 metadata.poster = poster;
278 unload();
279 callback(metadata); // We've got all the metadata we need now.
280 }
281 });
282 };
283 }
Updated•11 years ago
|
No longer blocks: 1.4-devices-targeted
Comment 24•11 years ago
|
||
The simpler API looks good but I'd suggest asking for feedback on mozilla.dev.webapi for feedback on the Web based interface. They'll be able to advise on the current standard for using events vs other async mechanisms. There will also need to be an implementation for non-OMX backends.
Updated•11 years ago
|
Whiteboard: [ucid:Device14, 1.4:p2, ft:devices] → [ucid:multimedia10, 1.4:p2, ft:multimedia-platform]
Reporter | ||
Comment 25•11 years ago
|
||
Add one more advantage.
(In reply to Blake Wu [:bwu] from comment #23)
> Advantages:
> For this new design,
> 1. Remove unnecessary audio seeking in current design.
> 2. Better to improve, like apply software codec and current mechanism in
> gecko is complicated and unnecessary, like media decoder state machine.
> 3. Easy and clear api to use for gaia developer.
4. Improve thumbnail generating time (current seeking waits the seekTime frame from the previous keyframe as comment #6)
Updated•11 years ago
|
Blocks: 1.4-multimedia
Comment 26•11 years ago
|
||
We are no longer taking features for 1.3, so this is not a blocker.
blocking-b2g: 1.3? → -
Comment 27•11 years ago
|
||
According to the user story doc, this is no longer being tracked as part of 1.4.
No longer blocks: 1.4-multimedia
Reporter | ||
Comment 28•11 years ago
|
||
Add API discussion link,
https://groups.google.com/forum/#!topic/mozilla.dev.webapi/N9BW0V7MUHA
Reporter | ||
Comment 29•11 years ago
|
||
One another idea is
Do both finding the proper key-frame and decoding (SW decoder) it via JavaScript to be a self-contained module which should be able to save memory (normally HW decoder requires to allocate many memory buffers even there is only one frame for decoding) and could be reused cross-platform. But the performance (parsing and decoding time) of this module would be needed to check.
Reporter | ||
Comment 30•11 years ago
|
||
Add Bug 971645 as a blocking bug since that can improve the performance of seek which video thumbnail generation requires as well.
Blocks: 971645
Updated•11 years ago
|
blocking-b2g: - → ---
Whiteboard: [ucid:multimedia10, 1.4:p2, ft:multimedia-platform] → [ucid:multimedia10, 1.5, ft:multimedia-platform]
Updated•11 years ago
|
Blocks: b2g-multimedia
Updated•11 years ago
|
Component: General → Video/Audio
Product: Firefox OS → Core
Updated•9 years ago
|
Component: Audio/Video → Audio/Video: Playback
Updated•7 years ago
|
Rank: 15
Priority: -- → P2
Comment 31•3 years ago
|
||
The bug assignee didn't login in Bugzilla in the last 7 months.
:jimm, could you have a look please?
For more information, please visit auto_nag documentation.
Assignee: blakewu0205 → nobody
Flags: needinfo?(jmathies)
![]() |
||
Updated•3 years ago
|
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(jmathies)
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•