Last Comment Bug 382267 - (video) Implement WHATWG Video spec
(video)
: Implement WHATWG Video spec
Status: VERIFIED FIXED
:
Product: Core
Classification: Components
Component: Audio/Video (show other bugs)
: Trunk
: All All
: P1 enhancement with 39 votes (vote)
: ---
Assigned To: cajbir (:cajbir)
:
Mentors:
: 381758 (view as bug list)
Depends on: 448910 517470 448680 448909 449282 449307 449315 449522 451004 451457 452698 454683 454686 456648 459753 459765 459938 460871 461135 461281 461287 462082 462378 462570 462953 462954 462956 462957 462958 462959 462960 462961 462962 463950 483573 489631 603034
Blocks: 336164 422538 422540 435298 435339 461286 761274
  Show dependency treegraph
 
Reported: 2007-05-28 16:47 PDT by cajbir (:cajbir)
Modified: 2012-06-04 12:45 PDT (History)
121 users (show)
dsicore: wanted1.9.1+
roc: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
First stab at impementing <video> element (82.07 KB, patch)
2007-07-04 20:48 PDT, cajbir (:cajbir)
no flags Details | Diff | Splinter Review
Version 2 of Video patch (381.12 KB, patch)
2007-07-12 22:53 PDT, cajbir (:cajbir)
no flags Details | Diff | Splinter Review
Version 3 of the <video> element patch (432.78 KB, patch)
2007-08-08 00:37 PDT, cajbir (:cajbir)
no flags Details | Diff | Splinter Review
Updated patch with 'configure' file removed (206.54 KB, patch)
2007-08-08 00:48 PDT, cajbir (:cajbir)
no flags Details | Diff | Splinter Review
Version 5 of the <video> patch (215.43 KB, patch)
2007-09-30 19:01 PDT, cajbir (:cajbir)
no flags Details | Diff | Splinter Review
Version 6 of the <video> element patch (217.68 KB, patch)
2007-10-04 21:27 PDT, cajbir (:cajbir)
no flags Details | Diff | Splinter Review
Version 7 of the <video> element patch (220.65 KB, patch)
2007-10-24 21:47 PDT, cajbir (:cajbir)
no flags Details | Diff | Splinter Review
Version 8 of the <video> element patch (219.83 KB, patch)
2007-10-29 19:33 PDT, cajbir (:cajbir)
no flags Details | Diff | Splinter Review
Version 9 of the <video> element patch (221.97 KB, patch)
2007-10-29 22:34 PDT, cajbir (:cajbir)
roc: superreview+
Details | Diff | Splinter Review
Version 10 of the <video> element patch (222.05 KB, patch)
2007-11-02 04:09 PDT, cajbir (:cajbir)
roc: superreview+
Details | Diff | Splinter Review
Version 11 of the <video> element patch (222.06 KB, patch)
2007-11-05 17:49 PST, cajbir (:cajbir)
no flags Details | Diff | Splinter Review
version 12 of the video element patch (134.25 KB, patch)
2008-03-12 20:54 PDT, cajbir (:cajbir)
no flags Details | Diff | Splinter Review
Version 13 of the video element patch (181.10 KB, patch)
2008-05-28 16:02 PDT, cajbir (:cajbir)
no flags Details | Diff | Splinter Review
version 14 of the video element patch (184.67 KB, patch)
2008-07-01 23:33 PDT, cajbir (:cajbir)
jst: review+
roc: superreview+
Details | Diff | Splinter Review
Address review comments by jst (3.62 KB, patch)
2008-07-08 22:01 PDT, cajbir (:cajbir)
roc: review+
roc: superreview+
Details | Diff | Splinter Review
Some tests for the video element patch (18.69 KB, patch)
2008-07-08 22:09 PDT, cajbir (:cajbir)
no flags Details | Diff | Splinter Review

Description cajbir (:cajbir) 2007-05-28 16:47:07 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3
Build Identifier: 

Implement the <video/> element as defined in the WHATWG specification for playback of Theora videos:

http://www.whatwg.org/specs/web-apps/current-work/#video0



Reproducible: Always

Steps to Reproduce:
1.
2.
3.
Comment 1 Jesse Ruderman 2007-05-28 17:39:40 PDT
*** Bug 381758 has been marked as a duplicate of this bug. ***
Comment 2 cajbir (:cajbir) 2007-07-04 20:48:51 PDT
Created attachment 270987 [details] [diff] [review]
First stab at impementing <video> element

This patch is a start at implementing the WHATWG <video> element. It successfully plays Ogg Theora videos on Mac, Windows and Linux with some issues which I'm working on.

On Mac OS X videos play in debug ok, but there is no sound in optimized builds. On Windows sound works fine in optimized builds. On Linux (only 64 bit linux tested so far) it plays slowly with no sound.
In all cases on optimized builds Firefox segfaults on exit. Problems also occur if streaming video is played and played over a slow link. 

I know about all those issues and am working on them. I'm also using channels  some things in thread unsafe ways (like channels) which I am now working on correcting.

But it does play videos for those so here's the first patch attempt for those that want to try it out. So in summary, use a debug until I've corrected these issues over the next few days.

This patch does not include the various Ogg and Portaudio libraries. That file is about 20MB so I've put it separately here: http://www.double.co.nz/video_test/third_party_modules.patch.gz

Apply the third party modules patch first, and then this one. Some example usage is available here: http://www.double.co.nz/video_test

You can download the example usage HTML and ogg files: http://www.double.co.nz/video_test/video_test.tar.gz

Another thing I plan to add is a configure option to not include the video option.
Comment 3 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-07-05 01:27:19 PDT
I think probably the configure option(s) should be at the codec level --- i.e. --enable-theora/--disable-theora. And, if no codecs are enabled, all the support code for the video element should be disabled. Something similar for audio.
Comment 4 Ivo Emanuel Gonçalves 2007-07-07 19:15:15 PDT
While this is still an early stage, I would like to remind that the <audio> works similarly to <video>, so do not forget it as this patch matures.  Furthermore, Vorbis and Speex support for <audio> would be great.
Comment 5 cajbir (:cajbir) 2007-07-12 22:53:01 PDT
Created attachment 272132 [details] [diff] [review]
Version 2 of Video patch

This new version of the patch to implement <video> adds/fixes the following:

* Ogg codec support can be enabled/disabled with configure flag --disable-ogg. Currently if the Ogg codec is disabled then the video element is disabled too. In the future if/when other codecs are supported this can change.
* Fix build problems when doing libxul enabled builds
* Fix link error on windows when doing a --disable-libxul build
* Fix colour playback issues on Linux and Windows 
* Handle no audio device being present
* Adjust element size when video size information is read from the Ogg file
* No longer use channel across threads
* Various refactorings based on email feedback

A couple of issues still to track down:

1) Sound not working on Linux
2) Sound not working on Mac OS X optimized builds

Remember you need to first apply the third party library patch at:

http://www.double.co.nz/video_test/third_party_modules.patch.gz

This has been updated since the last patch for some updates to the libraries.
Comment 6 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-07-13 02:13:36 PDT
A few comments:

Remove "configure" changes from subsequent patches.

What is MOZ_NO_CONFIG?

-DIRS		= base canvas media html xml xul xbl xslt
+DIRS		= base canvas html xml xul xbl xslt

What state is your tree in? "media" is not present on trunk.

We'll need to figure out a final destination for portaudio and the Ogg libraries. "content" isn't it.

You should probably start making your printfs #ifdef DEBUG_doublec, or (better) convert them to PR_LOG.

+class VideoInvalidateEvent : public nsRunnable {
+public:
+  VideoInvalidateEvent(nsIFrame* f) : mFrame(f) {}

There will be problems if the frame goes away while one of these events is pending. Similar issues apply to your other events. There are two ways to solve this:
-- Keep a reference to pending events from the content element or (in this case) the frame, and if the element/frame is destroyed while an event is pending, call a method on the event to disconnect it from the data that's going away. nsRevocableEventPtr can help with that.
-- Hold strong (nsCOMPtr/nsRefPtr) references to content elements from events so the element can't be deleted until the event has fired. You can't hold strong references to frames, since they're not refcounted, but you can hold a reference to the frame's element and simply re-lookup the frame when the event fires, via an nsIPresShell (which you can get from a frame, and also should hold a strong reference to). Of course you have to be aware there might no longer be a frame.

These events need comments explaining what they're for and what thread they run on. In fact some documentation for the general plan here would be good.

It seems that you have threads other than the main thread touching content elements and possibly even frames. This is not good. Any state that needs to be shared across threads should be broken out into its own object. This might be a good time to break out the ogg-playing code into its own object so there's less ogg-specific stuff in nsHTMLVideoElement.

+  virtual ~VideoInvalidateEvent() { }

This is not needed.

+  PresentationEvent(nsHTMLVideoElement* element, nsCOMPtr<nsIThread>& decodeThread, OggPlay* player, PaStream* audio_stream, double framerate, double lastVideoFrameTime) :

You generally do not want to pass nsCOMPtrs as parameters. Just take an nsIThread* here. Also, naming conventions: aAudioStream, etc.

+  mDecoding = false;

PR_FALSE (and PR_TRUE elsewhere, etc)

+      mAudioThread = 0;

nsnull

+    if (mRGB) {
+      PR_Free(mRGB);
+    }

Use nsAutoPtr/nsAutoArrayPtr to avoid manual freeing of buffers. You'll need to use operator new instead of PR_Malloc, which is also a good thing :-)

+  unsigned long mFrames;

Use PRUint32 or PRUint64.

+  unsigned char* getVideoRGB();

GetVideoRGB() (similar for other names)

+  bool IsStopped();
+  bool IsDecoding();
+  bool IsPaused();

You should really use PRBool instead of bool, here and elsewhere.

+  NS_IMETHOD Stop();

Don't use NS_IMETHOD unless you're actually implementing an XPCOM interface. You probably just want "virtual void" here.

+class AudioDevice 

Public classes like this should use the ns... prefix. Ditto for ChannelToPipeListener and OggPlayChannelReader.

+  gfxContext* ctx = (gfxContext*)aRenderingContext.GetNativeGraphicData(nsIRenderingContext::NATIVE_THEBES_CONTEXT);

static_cast<>

+    nsIDeviceContext *idc;
+    aRenderingContext.GetDeviceContext(idc);
+    double p2a = idc->AppUnitsPerDevPixel();

Use GetPresContext()->AppUnitsToDevPixels instead.

+    if((unsigned int)mLastFrameSize.width != element->getVideoWidth() ||
+       (unsigned int)mLastFrameSize.height != element->getVideoHeight()) {
+      printf("Frame Needs Reflow!\n");
+      nsPresContext *presContext = PresContext();
+      nsIPresShell *presShell = presContext ? presContext->GetPresShell() : 0;
+      if (presShell) { 
+        presShell->FrameNeedsReflow(this, nsIPresShell::eStyleChange,
+                                    NS_FRAME_IS_DIRTY);

Why are you doing this while painting? Is this the only time you can get the video height/width? Can the video height and width change during the stream or something?

As you know, this needs to change to scale the video to the size of the frame. That will also take care of the situation where there's multiple device pixels per CSS pixel.

I know you just copied stuff from nsImageFrame but GetInnerArea and mBorderPadding can go away; just call GetContentRect() instead of GetInnerArea() (but GetContentRect() returns a rect relative to the parent, so you'll want GetContentRect() - GetPosition(), or something like that)
Comment 7 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-07-13 04:15:45 PDT
One thing that's not implemented yet that's quite important is fallback to the contents of the <video> element when <video> does not support the given media format. You'll have to study how <object> does that...
Comment 8 cajbir (:cajbir) 2007-07-13 04:22:25 PDT
(In reply to comment #6)
> What is MOZ_NO_CONFIG?

I have no idea. Where does this appear?

> 
> What state is your tree in? "media" is not present on trunk.

See the bit in the my comment about the patch requiring the 'third party library' patch to be applied first:

http://www.double.co.nz/video_test/third_party_modules.patch.gz

> We'll need to figure out a final destination for portaudio and the Ogg
> libraries. "content" isn't it.

How about in the root 'modules' directory, where libbz2, etc are? If that is the preferred place I'd need some advice on how to have 'content/media/video' have that as a dependancy.

> -- Keep a reference to pending events from the content element...

To do this I'll need to keep a collection of events. What's the best way of storing an array/list of events? Is there a collection class that can safely hold nsCOMPtr's?

If the events are posted to the queue of a thread which is not the main thread, is it sufficient to stop those threads - therefore the remaining events won't be processed? That's they way I'm currently handling it.

> It seems that you have threads other than the main thread touching content
> elements and possibly even frames. 

All frame stuff is done on the main thread. The only thread code that affects the frame is via an event dispatched to the main thread. I do call GetPrimaryFrame in that non-main thread though, to pass to the invalidate event. I'll move to passing the element to the event and have that call GetPrimaryFrame so it occurs in the main thread. Is that ok?

No element code is touched on other threads except for specific members I added, so I'm pretty sure nothing in the core code is being called off the main thread. I do plan to move that code out into another object though to seperate it from the element.

> You generally do not want to pass nsCOMPtrs as parameters. Just take an
> nsIThread* here. 

They are references to nsCOMPtr's and being assigned in the constructor initialisation to another nsCOMPtr. If I make it a raw nsIThread*, but my original pointer is an nsCOMPtr<nsIThread>, is it safe to do:

  void var(nsIThread* baz) { ...store baz in an nsCOMPtr for later use... } 
  nsCOMPtr<nsIThread> foo = ...;
  call_bar(foo);

Can I pass that foo directly or do I need to call a method on nsCOMPtr to get at the pointer?

> You'll need to
> use operator new instead of PR_Malloc, which is also a good thing :-)

I used operator new in patch1 but another commenter via email suggested using PR_Malloc :-) What's a good strategy for protecting against integer overflow here? Given that the video size is from the ogg file, a manipulated ogg could craft an overflow.

> Why are you doing this while painting? Is this the only time you can get the
> video height/width? Can the video height and width change during the stream or
> something?

Yes, the height and width can change during the stream. I get the video height and width when decoding the video frame data. So I'd be doing it at the same time as I send the event that does the invalidate on the frame. Would it be better to do it there (in the nsHTMLVideoElement code, just before I call Invalidate)?

Thanks for the comments, I'll have them done in the next patch!
Comment 9 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-07-13 05:03:58 PDT
(In reply to comment #8)
> (In reply to comment #6)
> > What is MOZ_NO_CONFIG?
> 
> I have no idea. Where does this appear?

@@ -12371,6 +12274,9 @@ MOZ_NO_ACTIVEX_SUPPORT=1
 MOZ_NO_INSPECTOR_APIS=
 MOZ_NO_XPCOM_OBSOLETE=
 MOZ_NO_FAST_LOAD=
+MOZ_OGG=1
+MOZ_VIDEO=1
+MOZ_NO_CONFIG=

> > What state is your tree in? "media" is not present on trunk.
> 
> See the bit in the my comment about the patch requiring the 'third party
> library' patch to be applied first:
> 
> http://www.double.co.nz/video_test/third_party_modules.patch.gz

Oh I see. I'd assumed that only created new files and didn't patch Mozilla files. It would probably be good to do it that way: one patch that simply creates all files for the non-Mozilla libraries and another patch that patches Mozilla files and creates new Mozilla files.

> > We'll need to figure out a final destination for portaudio and the Ogg
> > libraries. "content" isn't it.
> 
> How about in the root 'modules' directory, where libbz2, etc are? If that is
> the preferred place I'd need some advice on how to have 'content/media/video'
> have that as a dependancy.

I'm not sure. We should ask bsmedberg on IRC or CC him here.

> > -- Keep a reference to pending events from the content element...
> 
> To do this I'll need to keep a collection of events. What's the best way of
> storing an array/list of events? Is there a collection class that can safely
> hold nsCOMPtr's?

Are you sure you need multiple events of the same type pending simultaneously? Which event type(s) would that be? Often it's simpler to just have one event which pulls the "next thing to do" off some internal queue.

> If the events are posted to the queue of a thread which is not the main
> thread, is it sufficient to stop those threads - therefore the remaining
> events won't be processed? That's they way I'm currently handling it.

Remaining events are always processed. I believe thread shutdown waits for all events to be processed before returning. I guess this does work, although it would stop working if we start sharing threads between videos, and it might impose significant delay when we tear down a video element.

> > It seems that you have threads other than the main thread touching content
> > elements and possibly even frames. 
> 
> All frame stuff is done on the main thread. The only thread code that affects
> the frame is via an event dispatched to the main thread. I do call
> GetPrimaryFrame in that non-main thread though, to pass to the invalidate
> event. I'll move to passing the element to the event and have that call
> GetPrimaryFrame so it occurs in the main thread. Is that ok?

That's a good plan. You will have to deal with revoking the invalidate event when the element goes away (or holding a strong ref to the element from the invalidate event).

> No element code is touched on other threads except for specific members I
> added, so I'm pretty sure nothing in the core code is being called off the
> main thread. I do plan to move that code out into another object though to
> seperate it from the element.

Yeah, you might get away with it, but it still smells bad :-)

> > You generally do not want to pass nsCOMPtrs as parameters. Just take an
> > nsIThread* here. 
> 
> They are references to nsCOMPtr's and being assigned in the constructor
> initialisation to another nsCOMPtr. If I make it a raw nsIThread*, but my
> original pointer is an nsCOMPtr<nsIThread>, is it safe to do:
> 
>   void var(nsIThread* baz) { ...store baz in an nsCOMPtr for later use... } 
>   nsCOMPtr<nsIThread> foo = ...;
>   call_bar(foo);
> 
> Can I pass that foo directly or do I need to call a method on nsCOMPtr to get
> at the pointer?

nsCOMPtr auto-converts to a raw pointer in this context, so no extra syntax is required. It's safe.

> > You'll need to
> > use operator new instead of PR_Malloc, which is also a good thing :-)
> 
> I used operator new in patch1 but another commenter via email suggested using
> PR_Malloc :-) What's a good strategy for protecting against integer overflow
> here? Given that the video size is from the ogg file, a manipulated ogg could
> craft an overflow.

I see. I would suggest an explicit size check here. A gigantic memory allocation might not fail but just bring the user's system to a grinding, thrashing halt. Also, you need to check for integer overflow here; generally speaking integer multiplications in an allocation size expression are bad.

> > Why are you doing this while painting? Is this the only time you can get the
> > video height/width? Can the video height and width change during the stream
> > or something?
> 
> Yes, the height and width can change during the stream.

Exciting!

> I get the video height
> and width when decoding the video frame data. So I'd be doing it at the same
> time as I send the event that does the invalidate on the frame. Would it be
> better to do it there (in the nsHTMLVideoElement code, just before I call
> Invalidate)?

Yes. I think it's safe to request a reflow while painting, but it's unusual and best avoided. Also, here requesting reflow during painting is too late: you'll paint a video image when the frame has the wrong size, before it gets resized to the right size. Requesting the reflow when you invalidate should work much better, we'll try to flush reflows before the paint happens so the frame will switch to the right size. There could still be problems where the image we decide to draw isn't the one we saw at invalidate time (because of A/V synch), and therefore still might be the wrong size for the frame, but we can't fix that issue without the compositor. Then again I've never seen a video that changed size halfway through :-).
Comment 10 timeless 2007-07-13 05:21:00 PDT
the modules directory was supposedly closed to new entries, we were trying to kill it.
Comment 11 maikmerten 2007-07-14 02:32:58 PDT
I just want to confirm that the frame dimensions may change during Ogg stream playback. Ogg allows concatenation of Ogg streams, meaning that "cat stream1.ogg stream2.ogg > stream.ogg" will produce a new, valid Ogg stream. This was mostly motivated to allow easy playlist streaming for e.g. web radio (read the data off disk, just push it out into the world).

So the frame dimensions can change, the framerate, the audio channel number, the audio sample rate...

Even codecs may change... if you concatenate an Ogg Vorbis (general purpose audio codec) file with an Ogg Speex (speech codec) file the resulting stream would still be valid Ogg - but I've yet to encounter a playback application that properly handles this. If you only support Ogg Theora + Vorbis (which would by far be the most common and useful combination) that's theoretical speaking anyway ;-)
Comment 12 cajbir (:cajbir) 2007-08-08 00:37:43 PDT
Created attachment 275741 [details] [diff] [review]
Version 3 of the <video> element patch

Remember you need to first apply the third party library patch at:

http://www.double.co.nz/video_test/third_party_modules.patch.gz

This has been updated since the last patch for some updates to the libraries, and to remove the mozilla specific updates as recommended by Robert.

This version of the patch addresses the comments from the various people who reviewed it on this bug, and via email. This includes:

* Sound works on Linux, Windows and Mac.
* Moved from using PortAudio to Sydney Audio, which is included as part of liboggplay which we're using anyway.
* Moved decoding stuff out of nsHTMLVideoElement and into a separate object, nsVideoDecoder. 
* Ditto with the routines to interface with the audio library.
* Various bug fixes
* Implemented some more of the specification.
Comment 13 cajbir (:cajbir) 2007-08-08 00:48:01 PDT
Created attachment 275742 [details] [diff] [review]
Updated patch with 'configure' file removed

Oops, accidentally included 'configure' in the last patch. This one removes it.
Comment 14 Mike Shaver (:shaver -- probably not reading bugmail closely) 2007-08-08 05:31:33 PDT
Cool work, Chris!

Is there a set of tests that people can run a patched build against, perhaps via mochitest, so that we can help start to iron out annoying configuration-based differences and such?
Comment 15 Aaron Leventhal 2007-08-08 06:31:16 PDT
Has any thought been put toward alternative content for accessibility? Has a bug been filed? I don't know if there are any consequences for a11y, such as captions. Do the captions have to be part of the video stream? Etc. etc.

For *any* new features we put into Gecko there needs to be some thought put into a11y. The a11y work also needs to block the release that the new feature is going into.
Comment 16 Hixie (not reading bugmail) 2007-08-08 14:04:34 PDT
There are some things that I still need to look at, but by and large my recommendation would be that we should embed "accessibility" concerns into the video streams themselves. It's not just disabled users who need these tools -- videos need captions for people who have no sound (e.g. who are in an office or in a bar); videos need to be able to play back at different rates for people who don't understand the language well or who want to get through large amounts of video quickly; for many videos, the image part is not critical (e.g. in "talking head" interviews), but people will still want the audio stream (e.g. in an audio-only situation such as a hands-free Web browsing system in a car). We want to make as much of the video available and usable to as many people as possible.

In short, I recommend against thinking of there being a need for "alternative content for accessibility", the problem is really "making videos usable to all users in all situations".

As far as captions go, I recommend making it one of the features accessible from the UA context menu of the video, along with other controls such as "full screen" and selecting alternate audio streams.
Comment 17 Aaron Leventhal 2007-08-08 19:31:03 PDT
I think that sounds reasonable, if the video format can support it. On the other hand, I'm not a closed captioning or second audio program expert.
Comment 18 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-08-09 04:29:09 PDT
+MOZ_ARG_DISABLE_BOOL(ogg,
+[  --disable-ogg           Disable Ogg Codec support],
+    MOZ_OGG=,
+    MOZ_OGG=1)

We want MOZ_ARG_ENABLE_BOOL so this is disabled by default.

Is the lifecycle of the element and its mDecoder documented somewhere? It's not clear when mDecoder should be created and destroyed, when we start the download, etc. We might need a state machine diagram with transitions. In particular you seem to call ElementAvailable when the element is bound to a document OR when src is set even if it's not in a document.

+ifeq ($(findstring 86,$(OS_TEST)), 86)
+ifneq ($(OS_ARCH),Darwin)
+ifdef HAVE_64BIT_OS
+CSRCS		+= \
+		x86_64/dsp_mmx.c \

Is this going to break on 64-bit Windows?

+XPIDLSRCS	= \
+		nsIAudioStream.idl \

Don't use XPCOM unless either a) we want the interface to be accessible or implementable by non-C++ or b) we want the interface to be *dynamically* pluggable. I don't think either applies in this case, so it should just be plain C++. And we don't need an abstract base class, just a concrete class.

+    /**
+     * Initialize the stream 
+     */
Explain the parameters.

+    /**
+     * Write data to the hardware.
+     */
Say something about the format of the samples.

+		nsHTMLVideoElement.h \

This should live with nsHTMLVideoElement.cpp, not here.

+  /* The OggPlayReader member must be the first item in this
+     class - The OggPlay library requires this class layout */

How about making nsChannelReader inherit from OggPlayReader? Then we can be sure that the layout is correct... Right?

+private:
+  nsCOMPtr<nsVideoDecoder> mDecoder;

nsRefPtr. Also, 'private' is superfluous.

+  void handleVideoData(int track_num, OggPlayVideoData* video_data);
+  void handleAudioData(int track_num, OggPlayAudioData* audio_data, int size);
+
+  void openAudioStream();
+  void closeAudioStream();

Leading capital letter. Some comments on this interface would be good too.

+  nsCOMPtr<nsIThread> mPresentationThread;

A comment would be useful here...

+  // Lock for accessing members across threads
+  PRLock* mLock;

Can you say exactly what state is protected by the lock? And which fields are accessible by which threads?

+  nsChannelReader* mReader;

WHat's the ownership model for this?

+  volatile PRBool mPlaying;
+  volatile PRBool mDecoding;
+  volatile PRBool mPaused;

PRPackedBool. Do these need to be volatile? If they're accessed by multiple threads they should probably be protected by mLock instead.

+  while (available < 300000 && last_available != available) {
+    last_available = available;
+    PR_Sleep(PR_MillisecondsToInterval(1000));

What do these magic numbers do? And what is the purpose of this function? It looks weird ... it loops until no data arrives for one second or we have 300,000 bytes? That could be a long time...

+  nsChannelReader * me = (nsChannelReader *)opr;

If you do the inheritance thing I suggested, you could use static_cast<> here.

+  if (mAudioHandle) {

Make this "if (!mAudioHandle) return..."

+            static_cast<short>(scaled_value);

Use "short(scaled_value)".

+    mLock(0), 

nsnull

+    return mLock ? PR_TRUE : PR_FALSE;

mLock != nsnull

+    while (CanDecode()) {
+      PR_Lock(mLock);

There's a window here where mLock is not held... what if mDecoder changes state during that interval, say so CanDecode would return false?

We're not using std::nothrow elsewhere are we? If not, not much point in starting to use it here.

+PRUint32 nsVideoDecoder::GetVideoWidth() {
+  return mRGBWidth;
+}

These and other trivial getters might as well be inline in the class.

+  nsCOMPtr<nsVideoInvalidateEvent> mInvalidateEvent;
+  nsCOMPtr<nsVideoDecodeEvent> mDecodeEvent;
+  nsCOMPtr<nsVideoPresentationEvent> mPresentationEvent;

Can you use nsRevocableEventPtr to hold these event references and simplify things?

Use nsAutoLock instead of explicit PR_Lock/PR_Unlock.

+    PRInt32 millis = static_cast<PRInt32>(mFramerate);

constructor cast (and elsewhere).

I see that we're still detecting frame size change in ::Paint. This means one or more frames will display at the wrong size. We should detect frame size changes earlier so we'll reflow to the right size before display the video frame.

nsHTMLVideoFrame::GetMinWidth can change its result whenever the presentation event fires. Whenever that changes you need to call MarkIntrinsicWidthsDirty on the frame.

The big thing that I'm worried about that's left to do is fallback for the cases where the format is not supported, similar to the way <object> works. I think we should look at that now at least to figure out what's required.
Comment 19 Anne (:annevk) 2007-08-09 04:33:49 PDT
I think the idea is that <video> does not have fallback content like <object>.
Comment 20 Dão Gottwald [:dao] 2007-08-09 04:49:59 PDT
Yes. From <http://www.whatwg.org/specs/web-apps/current-work/multipage/section-video.html>:

  Content may be provided inside the video element so that older Web browsers,
  which do not support video, can display text to the user informing them of
  how to access the video contents. User agents should not show this fallback
  content to the user. 

... which could be a bad idea, given that vendors are free to support any formats. On the other hand, who would actually make videos available in two or more formats? Real use cases would probably be more like <video src="foo.mov">Get Safari</video>.
Comment 21 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-08-09 05:01:55 PDT
Ah OK. I guess that idea has been superceded by the use of multiple <source> children to specify alternate formats. That's good. Although the spec has grown considerably since I last looked at it, which isn't so good...
Comment 22 cajbir (:cajbir) 2007-08-09 05:26:14 PDT
>Don't use XPCOM unless either a) we want the interface to be accessible or
>implementable by non-C++ or b) we want the interface to be *dynamically*
>pluggable. I don't think either applies in this case, so it should just be
>plain C++. And we don't need an abstract base class, just a concrete class.

I used XPCOM to make it easier for other people to implement different audio backends if Sydney wasn't available for their platform. It also made it easier for me to swap PortAudio and Sydney for testing purposes. I can make it plain C++ inheriting from an abstract base class and get the same effect but you're suggesting not to do that as well? Is the preference to have a concrete class with different implementations used via #defining out other platform code and/or different implementation files?

The same question applies to nsVideoDecoder. I was moving towards having an abstract video decoder class, with Ogg being the provided implementation. So if someone wanted a gstreamer, or other platform specific support, they could just implement that. Is it not worth doing that?

>if you do the inheritance thing I suggested, you could use static_cast<> here.
>...
>Use "short(scaled_value)".

In some places It's suggested to change to using a C++ cast, in others, to a C style casts. What's the rule for deciding which I should be using so I can get it right first time?

>We're not using std::nothrow elsewhere are we? If not, not much point in
>starting to use it here.

One reviewer suggesting using PR_Malloc to avoid the problem of operator new throwing an exception in an out of memory condition which, since we don't use exception handling anywhere, is likely to be bad. You suggested changing back to operator new. To solve both requests I used nothrow. Has operator new been changed in Mozilla to return 0 in out of memory conditions? Is it something that's not worth worrying about?

Thanks for the comments, I'll make the requested changes.
Comment 23 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-08-09 13:16:41 PDT
(In reply to comment #22)
> >Don't use XPCOM unless either a) we want the interface to be accessible or
> >implementable by non-C++ or b) we want the interface to be *dynamically*
> >pluggable. I don't think either applies in this case, so it should just be
> >plain C++. And we don't need an abstract base class, just a concrete class.
> 
> I used XPCOM to make it easier for other people to implement different audio
> backends if Sydney wasn't available for their platform. It also made it easier
> for me to swap PortAudio and Sydney for testing purposes. I can make it plain
> C++ inheriting from an abstract base class and get the same effect but you're
> suggesting not to do that as well? Is the preference to have a concrete class
> with different implementations used via #defining out other platform code
> and/or different implementation files?

Since we don't need multiple audio backends right now, I would not bother making it easy to add new ones. If someone comes along and wants to add one, they can either extend Sydney (the best option), or add #ifdefs, or patch the code to use an abstract base class instead.

> The same question applies to nsVideoDecoder. I was moving towards having an
> abstract video decoder class, with Ogg being the provided implementation. So
> if someone wanted a gstreamer, or other platform specific support, they could
> just implement that. Is it not worth doing that?

That makes more sense since we know we want to support additional decoders in the future.

> >if you do the inheritance thing I suggested, you could use static_cast<> here.
> >...
> >Use "short(scaled_value)".
> 
> In some places It's suggested to change to using a C++ cast, in others, to a C
> style casts. What's the rule for deciding which I should be using so I can get
> it right first time?

Use C++ casts wherever you can, which should be almost everywhere; they're safer. I don't recall why or where you needed to use a C-style cast.

> >We're not using std::nothrow elsewhere are we? If not, not much point in
> >starting to use it here.
> 
> One reviewer suggesting using PR_Malloc to avoid the problem of operator new
> throwing an exception in an out of memory condition which, since we don't use
> exception handling anywhere, is likely to be bad. You suggested changing back
> to operator new. To solve both requests I used nothrow. Has operator new been
> changed in Mozilla to return 0 in out of memory conditions? Is it something
> that's not worth worrying about?

It hasn't been changed, but it needs to be. I guess you can just leave std::nothrow in for now.
Comment 24 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-08-09 13:18:33 PDT
Looking at the latest spec, there's a ton of pseudo code there. We should make sure our code is organized so it looks just like the spec, with functions and variables where the spec (implicitly) defines functions and variables.
Comment 25 Brian Crowder 2007-08-09 14:42:03 PDT
reinterpret_cast<> should give you anything you would need from a C-style cast, should it not?
Comment 26 Andre-John Mas 2007-08-23 09:20:03 PDT
Are there any plans to support the attributes specified by the spec? From my understanding this should be able to work:

<video src="test4_files/transformers480.ogg" controller="true" autoplay="true"/>

The patched version doesn't seem to support this yet?

Comment 27 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-08-23 18:21:49 PDT
The patch is not complete yet.
Comment 28 Ivo Emanuel Gonçalves 2007-08-29 06:48:10 PDT
I'm wondering: what audio backend are you using for Linux?  ALSA or OSS 4.0?  If you are using ALSA, I suggest switching to OSS, as ALSA emulates OSS well and OSS is the default backend of a lot of UNIX and Linux systems.

OSS is also easier to program to.
Comment 29 cajbir (:cajbir) 2007-08-29 08:05:28 PDT
Ivo, I am using OSS for Linux so hopefully that will cover most systems.

Andre-John, Yes I plan to support the rest of the specification, but the current patch does not support the functionality you mention yet. 

Robert, yep, I'm moving to follow the pseudo code closer.
Comment 30 Henri Sivonen (:hsivonen) 2007-09-19 12:00:26 PDT
Re: comment 16, captioning-related URLs from maikmerten on #whatwg:
http://trac.annodex.net/wiki/CmmlSubtitles
http://wiki.xiph.org/index.php/Ogg_Skeleton
http://svn.xiph.org/trunk/writ/

(Annodex and Writ are different.)
Comment 31 cajbir (:cajbir) 2007-09-30 19:01:36 PDT
Created attachment 282950 [details] [diff] [review]
Version 5 of the <video> patch

This patch corrects various review comments and bugs. The main change with this patch is rewriting portions of it to follow the pseudo code of the specification. Supported now is the 'autoplay' attribute and the first frame of the video is loaded initially. The element now works embedded as an SVG foreignObject.
Comment 32 cajbir (:cajbir) 2007-09-30 19:03:00 PDT
Don't forget to apply the third party modules patch first:

http://www.double.co.nz/video_test/third_party_modules.patch.gz

Some example usage of the video element here:

http://www.double.co.nz/video_test
Comment 33 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-01 22:01:14 PDT
+  // Media loading flags. See: 
+  //   http://www.whatwg.org/specs/web-apps/current-work/#video)
+  PRUint16 mNetworkState;
+  PRUint16 mReadyState;

Can we have typedefs for PRUint16 defined for these?

+  PRBool mBegun;
+  PRBool mLoadedFirstFrame;
+  PRBool mLoadedEnoughToPlayThrough;
+  PRBool mAutoplaying;
+  PRBool mPaused;
+  PRBool mSeeking;

Might as well make these PRPackedBool.

You might want to document any useful invariants that hold between these booleans.

You don't need to initialize the decoder in two different places.

Document the relationship between mAutoplaying and the autoplay attribute, in particular, clarify what the difference is between them.

+  double mVolume;
+#if defined(SYDNEY_AUDIO_NO_POSITION)
+  double mPauseTime;
+#else
+  PRInt64 mPauseBytes;
+#endif
+  void* mAudioHandle;
+  int mRate;

Document the units for all these

The methods of nsVideoDecoder need more documentation to indicate on which thread they are called. Maybe group the methods that are called on each thread? Similarly, group the data members according to which thread they are used by. Maybe even put the thread name in the names of the methods.

+  nsCOMPtr<nsVideoInvalidateEvent> mInvalidateEvent;
+  nsCOMPtr<nsVideoDecodeEvent> mDecodeEvent;
+  nsCOMPtr<nsVideoPresentationEvent> mPresentationEvent;

Use nsRevocableEventPtr here so you don't have to explicitly Revoke() when these are nulled out in Shutdown()?

+  PRBool mFirstFrameLoaded;
+  PRCondVar* mFirstFrameCondVar;
+  PRLock* mFirstFrameLock;

Explain what these protect and how the condition variable is used.

+  sa_stream_destroy((sa_stream_t*)mAudioHandle);

static_cast (and elsewhere)

+  *aTime = ((double)PR_IntervalToMilliseconds(PR_IntervalNow()))/1000.0 - mPauseTime;
+    *aTime = static_cast<double>(((bytes + mPauseBytes) * 1000 / mRate / (sizeof(short) * mChannels))) / 1000.0;

Constructor-style-cast

+  nsChannelReader* me = (nsChannelReader*)handle;

static_cast (and elsewhere)

+  Presentation Thread
+    This thread goes through the data decoded by the decode thread
+    and displays it if it is video, or queues it for playing if it
+    is audio. It owns the audio device - all audio operations occur
+    on this thread.

It doesn't really display it, right? We just queue frames to be picked up by the main thread or something.

I think all the comments here should go in nsVideoDecoder.h.

+   As a final step of the first frame event is to notify a condition
+   variable that we have decoded the first frame.

This doesn't quite parse.

+// An Event that uses an nsVideoDecoder object. It provides a
+// way to inform the event that the decoder object is no longer
+// valid, so queued events can safely ignore it.
+class nsDecoderEvent : public nsRunnable {

How about adding a RunWithLock method that subclasses have to override, so that the "lock ... run ... unlock" pattern of Run() can be shared in the superclass?

Specify which thread each event runs on.

+  if (!mRGB) {
+    mRGBWidth = y_width;
+    mRGBHeight = y_height;
+    if (mRGBWidth < MAX_VIDEO_WIDTH && mRGBHeight < MAX_VIDEO_HEIGHT)  {
+      mRGB = new (std::nothrow) unsigned char[mRGBWidth * mRGBHeight * 4];
+    }
+  }
+  else if (mRGBWidth != y_width || mRGBHeight != y_height) {
+    mRGBWidth = y_width;
+    mRGBHeight = y_height;
+    if (mRGBWidth < MAX_VIDEO_WIDTH && mRGBHeight < MAX_VIDEO_HEIGHT)  {
+      mRGB = new (std::nothrow) unsigned char[mRGBWidth * mRGBHeight * 4];
+    }
+    else {
+      mRGB.forget();
+    }
+  }

This could be simplified to avoid the code duplication.

+    PR_Lock(mLock);
+    if (mElement) {
+      nsCOMPtr<nsIRunnable> metadataLoadedEvent = 
+        NS_NEW_RUNNABLE_METHOD(nsVideoDecoder, this, MetadataLoaded); 
+
+      if (metadataLoadedEvent) {
+        NS_DispatchToMainThread(metadataLoadedEvent, NS_DISPATCH_NORMAL);
+      }
+    }
+    PR_Unlock(mLock);

Why do you need mLock and the mElement check here?

+    PR_Lock(mLock);
+    if (mElement) {
+      nsCOMPtr<nsIRunnable> frameLoadedEvent = 
+        NS_NEW_RUNNABLE_METHOD(nsVideoDecoder, this, FirstFrameLoaded); 
+
+      if (frameLoadedEvent) {
+        NS_DispatchToMainThread(frameLoadedEvent, NS_DISPATCH_NORMAL);
+      }
+    }
+    PR_Unlock(mLock);

Ditto

+void nsVideoDecoder::ElementUnavailable()
+{
+  // Need to ensure that we are not using mElement in the invalidate
+  // thread before setting it to null.
+  PR_Lock(mLock);
+  mElement = nsnull;
+  PR_Unlock(mLock);  
+}

The comment doesn't make sense to me. InvalidateFrame is only going to be called on the main thread (right?), and so is this, so this can't be interleaved with InvalidateFrame().

+    PRInt32 millis = static_cast<PRInt32>((mVideoNextFrameTime-audio_time) * 1000.0);
+    PRInt32 millis = static_cast<PRInt32>(mFramerate);

Constructor-style cast

It seems that the patch currently tries to play all audio and video tracks in the file. It should probably play just one audio and one video track.

You can use nsAutoLock all over the place.

+    if (static_cast<PRUint32>(mLastFrameSize.width) != element->GetVideoWidth() ||
+        static_cast<PRUint32>(mLastFrameSize.height) != element->GetVideoHeight()) {

Constructor casts

+nsVideoFrame::Reflow(nsPresContext*           aPresContext,
+                          nsHTMLReflowMetrics&     aMetrics,
+                          const nsHTMLReflowState& aReflowState,
+                          nsReflowStatus&          aStatus)

indentation

+#if 0
+
+  virtual nsSize ComputeSize(nsIRenderingContext *aRenderingContext,
+                             nsSize aCBSize, nscoord aAvailableWidth,
+                             nsSize aMargin, nsSize aBorder, nsSize aPadding,
+                             PRBool aShrinkWrap);
+#endif

delete?

Overall I think this is looking pretty good, I think the architecture is nicely nailed down. It would be good to keep a list of known issues somewhere, probably as a tracking bug.
Comment 34 cajbir (:cajbir) 2007-10-04 21:27:51 PDT
Created attachment 283664 [details] [diff] [review]
Version 6 of the <video> element patch

No new functionality from version 5. Addresses the review comments.
Comment 35 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-04 21:49:24 PDT
+  // The following methods must only be called on the main
+  // thread.

Use ***** or something to make these bracketing comments stand out.

+  PRBool mPaused;
+
+  // True if the first frame of data has been loaded. This member,
+  // along with the condition variable and lock is used by threads 
+  // that need to wait for the first frame to be loaded before 
+  // performing some action. In particular it is used for 'autoplay' to
+  // start playback on loading of the first frame.
+  PRBool mFirstFrameLoaded;

PRPackedBool (and you needn't bother with bitfields, that's a bit more code)

But I think we go for a real review now. I think I can be the reviewer of record for the video and layout parts, and say ... Jonas ... for the content parts.
Comment 36 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-04 21:51:38 PDT
Jonas, this isn't 1.9 material exactly, but I think a lot of us would like to have this landed on trunk (disabled of course), so I wonder if you'd be interested in reviewing the content/DOM side.
Comment 37 Brendan Eich [:brendan] 2007-10-04 22:19:46 PDT
It's fine with me to land this disabled on cvs.mozilla.org, but at some point we should consider using hg.mozilla.org to develop features that could be merged into release vehicles, independently of big-integration-release-conflations as we use on the CVS trunk for each Mozilla milestone (which then moves to a CVS branch).

Here is not the place, but now may be the time. Consider this planting a seed. I'll send mail to the relevant lists/groups shortly, but I hope video can blaze a new trail, away from the ever-lovin' CVS repository.

/be
Comment 38 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-10-05 01:12:39 PDT
Honesly, if this is in a state where we're ready to land it on trunk then I think we should attempt to get it in to FF3. This would be a huge help to keep the internet open by promoting open standards for video at a very crucial state for video on the web. Ideally we would have had this in a release a year ago, but waiting another year or two I think would make it much harder for free codecs to get market share.

Yes, I know we're well past feature freeze, but I think this one is worth making an exception for. It speaks directly toward MoFos goal of promoting freedom and innovation on the internet.
Comment 39 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-10-05 01:14:12 PDT
In other words, yes, I'd be willing to review the contet/DOM parts. On my free time if I have to :)
Comment 40 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-05 12:33:16 PDT
The patch is quite far from being feature-complete with respect to the WHATWG video spec, so I'm still not optimistic about turning it on for FF3. Having it in the tree will make it easier to get help and move it along, of course.
Comment 41 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-05 13:46:54 PDT
Brendan: this patch is very well isolated from existing code so it doesn't really benefit from an hg branch. Also, I think before we start using Mercurial in earnest we should make it easier to produce a working mozilla-trunk build using Mercurial (see http://weblogs.mozillazine.org/roc/archives/2007/08/mercurial.html).
Comment 42 Aaron Leventhal 2007-10-06 09:42:25 PDT
Don't land it on trunk for Firefox 3unless it somehow supports closed captioning. Otherwise it's a section 508 violation.
Comment 43 Mike Shaver (:shaver -- probably not reading bugmail closely) 2007-10-06 10:22:08 PDT
With sync events authors can provide closed captioning streams through page content, if nothing else -- I'd be surprised if sec508 is so specific as to require a given API, so that's likely fine.
Comment 44 Aaron Leventhal 2007-10-06 10:54:14 PDT
No a specific API is required. It should be sufficient to provide a working example with code so people can't replicate the results. Unless it's just plain obvious already from the existing docs.
Comment 45 Ivo Emanuel Gonçalves 2007-10-06 13:27:39 PDT
(In reply to comment #44)
> No a specific API is required. It should be sufficient to provide a working
> example with code so people can't replicate the results. Unless it's just plain
> obvious already from the existing docs.
> 

The container Ogg supports different captioning formats; it's a matter of selecting one from the crowd.  Xiph provides CMML, which is for everything that is timed text.  An example on how to use for captions may be found at [1].

The liboggplay[2], which is a plugin provided by Xiph for Firefox, probably supports CMML already.  If it doesn't, it's a matter of time til it does.

[1] http://trac.annodex.net/wiki/CmmlSubtitles
[2] http://wiki.xiph.org/index.php/OggPlay
Comment 46 cajbir (:cajbir) 2007-10-06 14:28:02 PDT
What API is required by section 508? The only mention in the WHATWG spec related to closed captioning is:

"User agents should provide controls to enable or disable the display of closed captions associated with the video stream, though such features should, again, not interfere with the page's normal rendering."

The patch as it is detects for CMML and sends it to a PR_LOG. Support for displaying the text in the UI needs to be implemented.

It would also be possible to use the 'addCuePoint' functionality to display text at various time intervals which could qualify as an API I guess.
Comment 47 Henri Sivonen (:hsivonen) 2007-10-07 07:44:06 PDT
I'm not familiar with Section 508 requirements and IANAL, so the following comment is from a practical (not legal) point of view:

Requiring the page author to implement captioning in JavaScript based on sync events makes captioning much harder than treating captions as an off-by-default track that the UA can play on its own. For practical accessibility, having closed captioning playback as a built-in feature will probably help people more than an API.

I think the preferable way to address captioning is to figure out how to mark CMML text as off-by-default ("closed") captioning track (as opposed to on-by-default translation subtitles or other text annotation), having a browser-wide user pref for enabling captioning and rendering captions onto the bottom part of the video frame without changing the dimensions of video if (and only if) captions are enabled. (Rendering glyphs in white fill with a black outline seems to be the time-tested styling that works on top of unpredictable colored video.)

The other video accessibility feature is audio description.

For future compat, I think it is important for the first release to have knowledge of which tracks are assistive (captioning or descriptive audio) and should not render by default even if the first release didn't really support those features. This would be to make sure assistive tracks from the future don't play accidentally. (So that if a later release of Firefox gains the ability to play audio descriptions authors don't need to be afraid of the descriptions playing by default in old versions.)

When these issues were discussed on #whatwg last month, I started the following wiki page:
http://wiki.whatwg.org/wiki/Video_accessibility
Comment 48 cajbir (:cajbir) 2007-10-07 13:07:50 PDT
My understanding is that Ogg Skeleton is the Ogg way of specifying information about multiple streams in an Ogg file. The patch currently doesn't support this and I don't know how much Skeleton stuff is out there or software support for it. Maybe Maik Mertern could comment on that.

Currently take the first audio track as the main audio and the first video track as the main video. I output to the log all CMML data. I'll investigate the what Skeleton provides.
Comment 49 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-07 14:26:43 PDT
I think we should get some accessibility support in before shipping this, but I think we should absolutely *land* it without accessibility support. Accessibility support should have its own bug that should block this one.
Comment 50 Ivo Emanuel Gonçalves 2007-10-08 04:54:30 PDT
(In reply to comment #48)
> I don't know how much Skeleton stuff is out there or software support
> for it.

VLC, xine, and some other video players support it.

> I'll investigate what Skeleton provides.

http://wiki.xiph.org/index.php/Ogg_Skeleton

The fourth paragraph gives an overview of other great features that Skeleton has besides informing the implementations of what's inside an Ogg container.

Comment 51 Ivo Emanuel Gonçalves 2007-10-08 04:57:33 PDT
And liboggplay, of course.  Shane Stephens mentioned that some of the code would be used on your Firefox patch.  As such, you may as well take the Skeleton portion.
Comment 52 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-08 15:07:33 PDT
Comment on attachment 283664 [details] [diff] [review]
Version 6 of the <video> element patch

Let's go for review.

I think the imported libraries should live under other-licenses. Not sure about the best locations beyond that.
Comment 53 maikmerten 2007-10-09 09:20:02 PDT
As already pointed out Ogg Skeleton is "just" a codec providing information about Ogg streams without actually having to analyze the whole Ogg file.

Chris started a rather interesting thread about captioning on the WHATWG mailing list. http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2007-October/012642.html

I think the most straight-forward approach would be to work with Annodex to officially specify captions in CMML, as apparently Chris already does extract CMML out of Ogg files. (See "Handling in CMML" and the proposed alternatives over at http://trac.annodex.net/wiki/CmmlSubtitles )

Another approach perhaps would be to develop an Ogg mapping for W3C Timed Text. This may be a not that much of a low hanging fruit, though.
Comment 54 Gervase Markham [:gerv] 2007-10-09 14:41:33 PDT
(In reply to comment #52)
> I think the imported libraries should live under other-licenses. Not sure about
> the best locations beyond that.

Any imported library under a license compatible with the MPL, LGPL and GPL (e.g. BSD-like licences) can live anywhere in the tree. We already have code across the tree under BSD-like licences (e.g. image codecs). I assume that all the code we are planning to use falls into this category, given that it may well be shipped by default? 

Gerv
Comment 55 Andre-John Mas 2007-10-09 14:58:46 PDT
In the case of corrupt or missing video, what should the right behaviour be? Since there is no 'alt' attribute, there is either the displaying of a broken video icon or displaying nothing. If we decide to go along the route of an icon, do want to differentiate between a video that could not be located and one of the wrong format?
Comment 56 Hixie (not reading bugmail) 2007-10-09 15:07:16 PDT
The video element never fallsback, nor does it ever display nothing. When no video can be played, the element should render transparent against its black background, at the same size it would use to play if it had video (the size of the element is independent of the video played).

You may display icons or text to report errors to the user in that black area, though, especially if the "controls" attribute is set (or scripting is disabled). There are also various events that fire when there are errors.

HTH. Let me know if you need the spec clarifying.
Comment 57 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-09 15:55:10 PDT
(In reply to comment #54)
> Any imported library under a license compatible with the MPL, LGPL and GPL
> (e.g. BSD-like licences) can live anywhere in the tree. We already have code
> across the tree under BSD-like licences (e.g. image codecs). I assume that all
> the code we are planning to use falls into this category, given that it may
> well be shipped by default? 

You're right, these libraries are actually all BSD so we can put them anywhere in the tree. That actually makes it harder to decide where to put them... I'd say probably under modules/ except for comment #10.

> The video element never fallsback, nor does it ever display nothing. When no
> video can be played, the element should render transparent against its black
> background, at the same size it would use to play if it had video (the size of
> the element is independent of the video played).

I don't understand. You mean we should never use the size of the video source as the intrinsic size of the element? It seems like it would be very useful to make the size of the video source be the intrinsic size of the element, and consistent with <img> too.
Comment 58 Hixie (not reading bugmail) 2007-10-09 16:12:17 PDT
Right; the idea was that the element would be a video viewport sized independently of the content, allowing arbitrarily sized video data to be put in the element without the content resizing. Video content is different from image content in that it is rarely played 1:1, and in that it doesn't actually suffer anywhere near as badly from being resized. Note how very few online video Web sites bother to resize their viewports to fit their video.
Comment 59 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-09 16:23:35 PDT
But if I was writing an app and was creating video to use as a component of the app, I'd be more likely to play it 1:1 and giving it an intrinsic size could be useful.

I don't see a downside to giving video a intrinsic size.
Comment 60 Hixie (not reading bugmail) 2007-10-09 16:36:46 PDT
Actually if it's part of the UI you'd be more interested in having it fit the widgets around it than having it be exactly 1:1 -- the fact that it was 1:1 would be secondary (and would drive the size of the media, not the size of the UI).

I don't really see an _up_side to giving video an intrinsic size.

The downside is that without a stylesheet, video elements keep changing size. This is a pretty big downside IMHO.
Comment 61 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-09 17:04:28 PDT
(In reply to comment #60)
> Actually if it's part of the UI you'd be more interested in having it fit the
> widgets around it than having it be exactly 1:1

I don't see how you can claim that will be universally true.

> I don't really see an _up_side to giving video an intrinsic size.

I can tell you from experience that having a 1:1 default intrinsic size is useful when banging out test cases. 1:1 will be the right default whenever the video is being generated for a particular page. I think therefore authors will find it a convenient starting point. It's also consistent with how images work, including JPEG images which have the same fuzzy-resize characteristics as video.

> The downside is that without a stylesheet, video elements keep changing size.
> This is a pretty big downside IMHO.

Worse than having a fixed size chosen arbitrarily by the browser, which could even vary across browsers? Not so I*M*HO.
Comment 62 Hixie (not reading bugmail) 2007-10-09 17:42:02 PDT
> I don't see how you can claim that will be universally true.

Imagine what you'd want to have happen if you changed your UI zoom level. Would you want the video to scale as well, or would you want the video to remain 1:1? Why would your answer differ for a non-standard zoom level than for the usual "100%" zoom level case?


> I can tell you from experience that having a 1:1 default intrinsic size is
> useful when banging out test cases.

I don't think designing for test scenarios is the best way forward here. There are plenty of features that I'd have put in specs if I were optimising for writing tests. To start with, all UAs would have to ship with the Ahem font. :-)


> 1:1 will be the right default whenever the video is being generated for a 
> particular page.

Are we assuming that will be the common case? I don't think it necessarily is. I don't think it's necessarily even the case with images.

The changing size problem is not to be dismissed easily, either. What dimensions do you use for the empty case? What do you do when the native video size is too big to fit the viewport or even the device, or too small to be useful? In almost all cases, authors will be setting the size manually from CSS anyway, why make the non-CSS case worse?


> I think therefore authors will find it a convenient starting point. It's also 
> consistent with how images work, including JPEG images which have the same 
> fuzzy-resize characteristics as video.

<video> and <img> have very different characteristics and use cases. For example, it is an expected use of <video> that it will start empty and be filled in later, or that it will be switched from resource to resource without changing dimensions. I think the image comparison here is a red herring -- I think <video> is more like <iframe> and HTML.


> > The downside is that without a stylesheet, video elements keep changing 
> > size. This is a pretty big downside IMHO.
> 
> Worse than having a fixed size chosen arbitrarily by the browser, which could
> even vary across browsers? Not so I*M*HO.

I'm all in favour of picking a standard size that all browsers follow. It would at least guarentee that when CSS isn't enabled, the user gets a decent usable video player even if the video has dimensions widely out of proportion of those of the output device (which is actually quite likely to be common in practice, given the sizes of consumer digital video cameras -- tiny -- and the size of HD content -- huge).

(I recommend 560x315 CSS pixels, since that makes 16:9 a round number and is in the ballpark of your typical Web video player's size. Given a 96dpi screen, it's far below 1:1 in terms of any sort of commercial video formats: even NTSC is bigger than that at 720x480 native pixels.)


I'm all for allowing the user to change the zoom level (100%, 200%, Fit), or pop out the video into a resizable window that can default to 1:1, or to be played fullscreen; and indeed the spec encourages this. But I don't see the page being reflowed every time the video loads being a good thing.
Comment 63 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-09 21:40:21 PDT
(In reply to comment #62)
> > I don't see how you can claim that will be universally true.
> 
> Imagine what you'd want to have happen if you changed your UI zoom level.
> Would you want the video to scale as well, or would you want the video to
> remain 1:1? Why would your answer differ for a non-standard zoom level than
> for the usual "100%" zoom level case?

What "UI zoom level"?

If I was using <video> in a UI I'd expect it to behave just like an <img>. That means that I can use its intrinsic size to constrain part of my layout when that's convenient.

> > I can tell you from experience that having a 1:1 default intrinsic size is
> > useful when banging out test cases.
> 
> I don't think designing for test scenarios is the best way forward here.

Sorry, that wasn't really my point. My point was that having an intrinsic size minimizes the cost of getting started when you'll be working with 1:1-scale video.

> The changing size problem is not to be dismissed easily, either. What
> dimensions do you use for the empty case?

0x0

> What do you do when the native video
> size is too big to fit the viewport or even the device, or too small to be
> useful? In almost all cases, authors will be setting the size manually from
> CSS anyway, why make the non-CSS case worse?

I don't think using the intrinsic size is making it worse, that's the point.

> <video> and <img> have very different characteristics and use cases. For
> example, it is an expected use of <video> that it will start empty and be
> filled in later, or that it will be switched from resource to resource without
> changing dimensions. I think the image comparison here is a red herring -- I
> think <video> is more like <iframe> and HTML.

I don't.

I think you're thinking of the main use of <video> being stuff like Youtube, Google Video, etc.

I'm thinking that at least as big a use, probably much bigger, will be ads, page decorations, and UI bling, exactly the sort of things that images are used for today.

> I'm all in favour of picking a standard size that all browsers follow.

Then you'll have the "too big to fit in the viewport/device" problem that you worried about above.

> It would
> at least guarentee that when CSS isn't enabled, the user gets a decent usable
> video player even if the video has dimensions widely out of proportion of
> those
> of the output device (which is actually quite likely to be common in practice,
> given the sizes of consumer digital video cameras -- tiny -- and the size of
> HD content -- huge).

Right, this is you focused on Youtube again.

> But I don't see the
> page being reflowed every time the video loads being a good thing.

It needn't, if the video starts loading early enough that it's before we start displaying the page.

But you know what? I don't care. Have it your way. Put 560x315 in the spec and we'll deal. Unless Chris cares more than I do.
Comment 64 Hixie (not reading bugmail) 2007-10-09 21:56:19 PDT
> What "UI zoom level"?

Bug 4821-style zooming.


> If I was using <video> in a UI I'd expect it to behave just like an <img>. 
> That means that I can use its intrinsic size to constrain part of my layout 
> when that's convenient.

Can you give an example of when you would do this? I don't really follow. What kind of UI would you imagine doing this?


> I think you're thinking of the main use of <video> being stuff like Youtube,
> Google Video, etc.
> 
> I'm thinking that at least as big a use, probably much bigger, will be ads,
> page decorations, and UI bling, exactly the sort of things that images are 
> used for today.

<video> isn't supposed to be a rendering-level primitive, it's supposed to be an element to embed an image into content. The rendering-level primitives belong in visual languages like SVG. This is like how <img> shouldn't be used for visual effects and backgrounds. For that we have CSS, SVG, etc. At least that has been my understanding to this point, and that has been the focus in the API design.

If we're trying to do something that's a video primitive, then the <video> definitions are going to need a lot of work, IMHO.


> > I'm all in favour of picking a standard size that all browsers follow.
> 
> Then you'll have the "too big to fit in the viewport/device" problem that you
> worried about above.

Devices that are smaller than the default size would scale to fit, much like we require in CSS2.1 today for replaced elements.


> It needn't [reflow], if the video starts loading early enough that it's before 
> we start displaying the page.

I think, given the size and weight of video content, that in general I wouldn't expect this to be the common case.


> But you know what? I don't care. Have it your way. Put 560x315 in the spec and
> we'll deal. Unless Chris cares more than I do.

I'm not trying to fight you here, I'm trying to work with you to come up with the optimal solution for the Web. My apologies if you misconstrued my debating here as being some sort of authoritarian play.
Comment 65 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-09 22:25:36 PDT
> > If I was using <video> in a UI I'd expect it to behave just like an <img>. 
> > That means that I can use its intrinsic size to constrain part of my layout 
> > when that's convenient.
> 
> Can you give an example of when you would do this? I don't really follow. What
> kind of UI would you imagine doing this?

<button onclick="this.firstChild.play()"><video src="..."></button>

> > I think you're thinking of the main use of <video> being stuff like Youtube,
> > Google Video, etc.
> > 
> > I'm thinking that at least as big a use, probably much bigger, will be ads,
> > page decorations, and UI bling, exactly the sort of things that images are 
> > used for today.
> 
> <video> isn't supposed to be a rendering-level primitive, it's supposed to be
> an element to embed an image into content. The rendering-level primitives
> belong in visual languages like SVG. This is like how <img> shouldn't be used
> for visual effects and backgrounds. For that we have CSS, SVG, etc. At least
> that has been my understanding to this point, and that has been the focus in
> the API design.
> 
> If we're trying to do something that's a video primitive, then the <video>
> definitions are going to need a lot of work, IMHO.

Really? I'd think "video primitive" would need strictly few features.

But now your position makes a lot more sense to me (although I think the distinction will be lost on most Web authors, especially when the CSS and SVG alternatives aren't implemented or don't work for their use cases, like the above example).

> > It needn't [reflow], if the video starts loading early enough that it's
> > before  we start displaying the page.
> 
> I think, given the size and weight of video content, that in general I
> wouldn't expect this to be the common case.

We only have to read up to the point where we have the size of the first frame. This should be pretty quick if the container format doesn't suck.

> > But you know what? I don't care. Have it your way. Put 560x315 in the spec
> > and we'll deal. Unless Chris cares more than I do.
> 
> I'm not trying to fight you here, I'm trying to work with you to come up with
> the optimal solution for the Web. My apologies if you misconstrued my debating
> here as being some sort of authoritarian play.

I apologise for being snarky. But I genuinely don't much care about this issue.
Comment 66 cajbir (:cajbir) 2007-10-09 23:11:00 PDT
Before I implemented the implicit sizing, the number one thing people asked me when I demoed the video element was, how do you get it to automatically adjust to the size of the video.

If there is no width/height given for the video in the style how can you do this? Without Javascript?

Most people I demoed to assumed that it would automatically adjust to the size. Especially given that finding the exact size of a video can be a pain if all you want to do is throw a video on a webpage.

Do you think it's worthwhile raising this in the whatwg mailing list, or at clarifying the spec so it says what to do in the absence of size information?

Comment 67 Schuyler Duveen 2007-10-10 06:35:20 PDT
I (a lowly video web developer looking on with great enthusiasm) saw the conversation here, and thought I'd give some more use cases where <video> having an implicit size would help a LOT:

1. max-width, max-height CSS (the current web design style is very boxy and there is less white space, but this isn't always the case)

2. Over-scaled video can easily start appearing pixelated to the point that it's harder to see the video than in the original (smaller) scale.

3. I'm sure the <video> element will be implemented without a single bug in Firefox!  However, for example, experience with Quicktime proved to me that it's sometimes very hard to auto-rescale video EVEN WITH javascript.  Factors that come in to play is when the javascript loads, when the video content (or enough of it loads).  In Quicktime, it often reported corrupt scale information before it fully loaded (even though the 'status' said it fully loaded), etc.  

The main point is that you save the web developer that wants the 'default' scale a LOT of trouble by defaulting to it.
Comment 68 Hixie (not reading bugmail) 2007-10-10 14:24:06 PDT
Ok well based on this discussion and the discussion on #whatwg and elsewhere, I think we're going to go with having the element autosize to fit content when it has content loaded, as well as having height/width attributes to provide a hint to the UA about what the content's size will be (integer values, to be interpreted as CSS pixel units).

I'm not sure what we should do with an empty <video> though. Maybe it should default to 300x150, like other replaced content?
Comment 69 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-10 14:37:19 PDT
yay

> I'm not sure what we should do with an empty <video> though. Maybe it should
> default to 300x150, like other replaced content?

Sounds reasonable to me (and Chris).
Comment 70 Ian Pottinger 2007-10-13 14:10:30 PDT
Standard video has a 4:3 aspect ratio. The new widescreen format (HDTV, anamorphic DVD's, wide DV, etc.) has a 16:9 display aspect ratio. An empty <video> with one or the other aspect ratio as default would be more "reusable."
Comment 71 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-17 16:46:14 PDT
+  void Buffer();

This needs a better name, one that's more clearly a verb.

+  unsigned char* GetVideoRGB();

This needs a comment explaining the format and that it reflects the current frame.

We also need synchronization because this is read and written by different threads (along with video width and height), as we just discussed.
Comment 72 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-17 17:07:39 PDT
+  PRPackedBool mBegun : 1;

These probably shouldn't be bitfields, just plain PRPackedBools.

+  // Return the time in seconds that the video display is
+  // synchronised too.

"to"

+  // Framerate of video being displayed in the element 
+  double mFramerate;
+
+  // The time of the current frame from the video stream 
+  float mVideoCurrentFrameTime;
+
+  // The time that the next video frame should be displayed 
+  double mVideoNextFrameTime;
+  double mInitialVolume;

What are the units of these values? time measured in seconds? what's the reference point?

+  PRBool mPaused;
+  PRBool mFirstFrameLoaded;

PRPackedBool

+PRLogModuleInfo* gAudioStreamLog = nsnull;
+
+nsresult nsAudioStream::InitLibrary()
+{
+  gAudioStreamLog = PR_NewLogModule("nsAudioStream");

Need some #ifdef PR_LOGGING here, preferably with a LOG() macro that gets defined to PR_LOG or nothing.

+  // TODO: Close streams, etc here?

What about this? Is this leaking?

+    // TODO: Handle Stream Failure

Or this?

+// Logging object for decoder
+PRLogModuleInfo* gVideoDecoderLog = nsnull;

more PR_LOGGING #ifdef

+    if (mDecoder && mDecoder->StepDecoding()) {
+      if (mDecoder)

Comment that StepDecoding may have cleared mDecoder if the video finished, otherwise this test looks redundant.

+    if (mDecoder && !mDecoder->IsPaused() && mDecoder->StepDisplay()) {
+      if (mDecoder)

Ditto

+  // TOD: return NaN

TODO?
Comment 73 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-18 15:12:25 PDT
+        mFramerate = fpsd == 0 ? 0.0 : (double)((double)fpsn/(double)fpsd);

constructor casts

+        oggplay_set_offset(mPlayer, i, 250L);

+    if (mVideoTrack == -1) {
+      oggplay_set_callback_num_frames(mPlayer, mAudioTrack, 2048);
+    }
+    
+    oggplay_use_buffer(mPlayer, 20);

Document what these constants mean, or better still, make them #defines or named constants.

+    for (int i = 0; i < num_tracks; ++i) {
+      OggPlayDataType type = oggplay_callback_info_get_type(track_info[i]);
+      OggPlayDataHeader ** headers = oggplay_callback_info_get_headers(track_info[i]);

Why do we need this loop? We already know what track numbers we're interested in. Can't we just directly pull them out of track_info? Maybe this is because the number of tracks can change, but if that's so, then aren't the tests based on the video and audio track numbers incorrect?

+    PRInt32 millis = PRInt32(mFramerate);

Shouldn't this be 1000.0/mFrameRate or something?

+    nsPresContext* context = PresContext();
+    if (context) {

context cannot be null here.

+    nsRefPtr<gfxASurface> surface = 
+    nsRefPtr<gfxPattern> pat = new gfxPattern(surface);

test surface and pat for null here

+    // Resize if the video's size has changed. This can occur midstream
+    // in Ogg Theora files due to the way they can be concatenated together
+    // to form one video file.
+    if (PRUint32(mLastFrameSize.width) != element->GetVideoWidth() ||
+        PRUint32(mLastFrameSize.height) != element->GetVideoHeight()) {

This will all need to be rethought to avoid races with reflow, as we discussed.

+      nsIPresShell *presShell = context ? context->GetPresShell() : 0;
+      if (presShell) { 

Call context->PresShell(), and it can't be null either.

+nscoord nsVideoFrame::GetPrefWidth(nsIRenderingContext *aRenderingContext)
+{
+  // XXX The caller doesn't account for constraints of the height,
+  // min-height, and max-height properties.
+  nscoord result = GetVideoSize().width;
+  DISPLAY_MIN_WIDTH(this, result);

DISPLAY_PREF_WIDTH

+  // Defaulting size to 320x240 if no size given.
+  int width = 320;
+  int height = 240;

Change that to whatever Hixie wants

That's all!
Comment 74 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-18 15:23:44 PDT
+    mInitialVolume(1.0),

Shouldn't this be 0.5 according to the spec?

> Initially, the volume must be 0.5
Comment 75 cajbir (:cajbir) 2007-10-24 21:47:33 PDT
Created attachment 286116 [details] [diff] [review]
Version 7 of the <video> element patch

All changes mentioned in roc's review comments made.
Comment 76 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-25 00:49:40 PDT
+// The number of entries in oggplay buffer list
+#define OGGPLAY_BUFFER_SIZE 20
+
+// The number of frames to read before audio callback is called.
+#define OGGPLAY_FRAMES_PER_CALLBACK 2048
+
+// Offset into Ogg buffer containing audio information
+#define OGGPLAY_AUDIO_OFFSET 250L

This is good, but can you say something about why these values have been chosen?

It seems like if the Ogg stream changes the track assignments, we might start playing two audio tracks or something. Is there anything we can or should do about this?

There is one remaining problem. The decoder thread can change the video frame size when it gets a new frame. We now have locking so that change can't interfere with a single call to nsHTMLVideoElement::GetVideoSize, but we can still have a situation where, for example, GetMinWidth can return one thing, and then GetPrefWidth can return another thing less than the min-width because the decoder thread changed the video size in between. This is bad.

I think the only solution to that is to store a size in the element and return that from GetVideoSize, and only update the stored size in the ChangeFrameSize event. Does that make sense?
Comment 77 cajbir (:cajbir) 2007-10-29 19:33:00 PDT
Created attachment 286633 [details] [diff] [review]
Version 8 of the <video> element patch

Fixes the remaining issues raised in the review. 

- The values were chosen based on the sample code provided by liboggplay.
- liboggplay doesn't support chained ogg files at this stage so there is no issue regarding track assignment changing. This will have to be addressed when liboggplay supports it and provides a way of identifying that the tracks have changed.
- The issue with the video frame size changing has been fixed.

Also included are:

- Videos with no sound were not playing at the correct speed
- Moved the ogg libraries from content/media/modules to modules as discussed.

As a result of the location of the modules changing the patch for the Ogg libraries has been updated:

http://www.double.co.nz/video_test/third_party_modules.patch.gz
Comment 78 cajbir (:cajbir) 2007-10-29 22:34:36 PDT
Created attachment 286644 [details] [diff] [review]
Version 9 of the <video> element patch

This fixes up some small things raised by Robert in a face-to-face review. A change to the GetVideoSize interface and locking. Added makefiles to the toolkit makefile list.
Comment 79 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-30 00:30:53 PDT
Comment on attachment 286644 [details] [diff] [review]
Version 9 of the <video> element patch

+nsIntSize nsHTMLVideoElement::GetVideoSize(nsIntSize defaultSize)
+{
+  if (mDecoder) 
+    return mDecoder->GetVideoSize(defaultSize);
+}

Need to return defaultSize on the else path. Or just write it as
  mDecoder ? mDecoder->... : defaultSize;

Other than that, it's looking good. Over to you Jonas!
Comment 80 Hixie (not reading bugmail) 2007-10-30 00:52:02 PDT
By the way, the spec is pretty stable now. I recommend checking code and tests against it. If you have any feedback, now is a very good time to send it, as at least one other group is also implementing it and it is always much easier to get changes done before people check code in.
Comment 81 cajbir (:cajbir) 2007-11-02 04:09:33 PDT
Created attachment 287084 [details] [diff] [review]
Version 10 of the <video> element patch

- Fixes the missing default case roc mentioned in the last patch.
- Uses the threadsafe verison of ISupports implementation, fixing various assertions
- Adds a fix from roc that allows the videos to scale when using the nifty zooming ui.
Comment 82 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-11-04 13:04:58 PST
Comment on attachment 287084 [details] [diff] [review]
Version 10 of the <video> element patch

Sorry, restore Jonas review request
Comment 83 cajbir (:cajbir) 2007-11-05 17:49:12 PST
Created attachment 287477 [details] [diff] [review]
Version 11 of the <video> element patch

Minor fix to nsVideoDecoder.h to stop a link error on Windows builds.
Comment 84 cajbir (:cajbir) 2008-03-11 21:03:14 PDT
Comment on attachment 287477 [details] [diff] [review]
Version 11 of the <video> element patch

I've obsoleted the patch and removed the request for review. I'm about to submit a new patch soon and there have been quite a few changes. No point reviewing the existing code with that coming.
Comment 85 cajbir (:cajbir) 2008-03-12 16:25:01 PDT
Just a heads up that I'm refactoring the bug and patch. The patch that will soon be updated here will add support for the video element with no codecs. Bug 422538 adds support for Ogg Theora playback and bug 422540 adds support for a GStreamer backend.
Comment 86 cajbir (:cajbir) 2008-03-12 20:54:57 PDT
Created attachment 309026 [details] [diff] [review]
version 12 of the video element patch

Implements the WHATWG <video> element only, with no decoder. See bug 422538 and 422540 for decoders. Compile with configure flag --enable-video. Videos appear as a black area.
Comment 87 dolphinling 2008-04-07 11:05:30 PDT
Has this bitrotted? I can build current cvs fine, but when I apply the patch, I get this error:

/home/andrew/firefoxvideoblank/mozilla/content/html/content/src/nsHTMLSourceElement.cpp: In member function 'virtual nsresult nsHTMLSourceElement::QueryInterface(const nsIID&, void**)':
/home/andrew/firefoxvideoblank/mozilla/content/html/content/src/nsHTMLSourceElement.cpp:97: error: 'eDOMClassInfo_HTMLSourceElement_id' was not declared in this scope
gmake[7]: *** [nsHTMLSourceElement.o] Error 1

This is on an x86_64 system (no 32bit libs) with gcc 4.1.2 and my mozconfig is
. $topsrcdir/browser/config/mozconfig
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-@CONFIG_GUESS@
ac_add_options --enable-optimize
ac_add_options --disable-debug
ac_add_options --enable-video
Comment 88 cajbir (:cajbir) 2008-04-07 16:52:46 PDT
I just tested applying the patch to trunk and building and it worked fine on Linux x86_32. Do you run autoconf2.13 after applying the patch, before running make?
Comment 89 dolphinling 2008-04-07 22:57:49 PDT
Ah! The problem was that I wasn't running autoconf.

(In my defense, it didn't say to anywhere... though maybe I should have known)

Thanks!
Comment 90 cajbir (:cajbir) 2008-05-28 16:02:16 PDT
Created attachment 322868 [details] [diff] [review]
Version 13 of the video element patch

This patch was generated against mozilla-central. Major additions are <audio> and  progress events.
Comment 91 Olli Pettay [:smaug] 2008-05-28 16:20:57 PDT
ProgressEvent spec isn't stable yet. At least I'm hoping
that .loaded will be renamed.
HTML5 mentions that
"Firing a progress event called e means something that hasn't yet been defined,
in the [PROGRESS] spec. "

But ofc we can change the interface/implementation later to follow the spec.
Comment 92 cajbir (:cajbir) 2008-05-28 16:31:34 PDT
re ProgressEvent, yeah I included it because people were asking for it so they could do their ui's and I think WebKit also includes an implementation.
Comment 93 Damon Sicore (:damons) 2008-06-04 16:45:23 PDT
marking wanted1.9.1? to get this in the triage queue.  If this needs to be blocking1.9.1?, please mark it as so.
Comment 94 :Aryeh Gregor (working until September 2) 2008-06-04 16:49:40 PDT
I assume you meant wanted1.9.1?, as you said and as you did on the other bugs, not wanted1.9.1-.
Comment 95 Damon Sicore (:damons) 2008-06-04 16:58:29 PDT
yep.  Thanks for correcting.  :)
Comment 96 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-06-10 12:51:16 PDT
+dnl = Enable HTML video support
+dnl ========================================================
+MOZ_ARG_ENABLE_BOOL(video,
+[  --enable-video           Enable HTML5 video support],
+    MOZ_VIDEO=1,
+    MOZ_VIDEO=)
+
+AC_SUBST(MOZ_VIDEO)
+
+if test -n "$MOZ_VIDEO"; then
+    AC_DEFINE(MOZ_VIDEO)
+fi

We talked about enabling video when at least one backend is available. Are you going to change this to do that when we land the first backend? Also should --enable-video and MOZ_VIDEO actually be --enable-media and MOZ_MEDIA since it controls audio too?

+  // the 'autplay' attribute being set. A true value indicates the 

'autoplay'

I hope we don't really need to check all out parameters for null. jst?

Currently all your event dispatch is synchronous. This is risky since during that event dispatch scripts can do arbitrary things, including changing the state of this media element. So for example when Load calls ChangeReadyState(nsIDOMHTMLMediaElement::DATA_UNAVAILABLE) or DispatchSimpleEvent(NS_LITERAL_STRING("emptied")), anything could happen. Whenever possible you should dispatch events asynchronously, but I guess the spec doesn't let you do that in many cases. You may need a way to detect if the state changes under you and do things correctly if it has (and similarly for other places where you do work after a synchronous event dispatch). (For example, FirstFrameLoaded probably shouldn't fire canshowcurrentframe if loadedfirstframe changed something so that we can't show the current frame...) If the spec doesn't say what you should do in such situations, request feedback.

You should request feedback to get cues removed from the spec since Apple wants to change them anyway. Then you can remove them from this implementation.

It seems we don't currently honor the playback rate. Better document that.

+      if (NS_SUCCEEDED(source->HasAttribute(NS_LITERAL_STRING("src"), &hasSrc)) && hasSrc) {
+        nsAutoString type;
+
+        nsresult rv = source->GetAttribute(NS_LITERAL_STRING("type"), type);

There are simpler HasAttr and GetAttr methods available on nsIContent that you should use.

+        if (NS_SUCCEEDED(rv)) {
+          // Check type and instantiate for relevant decoders
+          // Currently no decoders supported

So we're just going to have hardcoded backend checks here? That means you can't add backends dynamically via an extension. That may be OK but how hard would it be to use the XPCOM category manager to enumerate backend components and call some interface here?

+  nsCOMPtr<nsIDocument> doc = GetOwnerDoc();

You probably want GetCurrentDoc.

What is supposed to happen if a <source> element is added to or removed from the DOM, or if its 'src' attribute is changed? We don't seem to do anything to detect that.

Some of the methods in nsVideoDecoder need to say whether they're main thread only or not.

Up to nsVideoDecoder.cpp.

Needless to say, the big missing thing here is tests. This really needs a lot of tests (including tests for scripts that do strange things during event dispatch, for example). I guess we need a backend for meaningful tests, so I suggest that we make tests that use Ogg Vorbis --- that will let us test most of the features cross-platform. We won't be able to land them until we land the Vorbis backend but I'd like to see the patch here anyway before we land this patch.
Comment 97 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-06-10 15:44:04 PDT
+  return mRGBLock && PR_TRUE;

We usually use mRGBLock != nsnull.

+  nsRect r = frame->GetRect();
+  r.x = r.y = 0;

nsRect r(nsPoint(0,0), frame->GetSize());

mRGBLock protects more than just the RGB buffer. For example nsVideoDecoder::Invalidate uses it. So we need a better name, and better documentation for it.

How does mRGB get reallocated when the video frame size changes? It doesn't seem to.

+  if(!mRGB)

Generally we have a space after the if. This needs to be fixed in many places.

Should we make nsIDOMHTMLVoidCallback be implementable using a raw JS function? It's trivial, see
http://mxr.mozilla.org/mozilla-central/source/dom/public/idl/events/nsIDOMEventListener.idl#52
Maybe that needs to be proposed as an extension to the spec; actually I'm surprised it's not already in the spec.

Instead of nsIVideoFrame just cast to nsVideoFrame after a GetType() check on the frame.

+  if (GetPrevInFlow()) {
+    nscoord y = GetContinuationOffset(&aMetrics.width);
+    aMetrics.height -= y + mBorderPadding.top;
+    aMetrics.height = PR_MAX(0, aMetrics.height);

This (and GetContinuationOffset) should be removed since you never break the video vertically across pages.

+video[controls] xul|videocontrols xul|button.controlbar {
+  visibility: hidden;
+}
+
+video[controls] xul|videocontrols xul|button.centerplay {
+  visibility: visible;
+}
+video[controls] xul|videocontrols.playing xul|button.centerplay {
+  visibility: hidden;
+}
+
+video[controls]:hover xul|videocontrols.playing xul|button.controlbar {

It's more efficient to use direct parent selectors ">" instead of descendant selectors as here.

Not really important, but it seems like some of the SVG could be cleaned up. For example
+		 cx="218.9404"
+		 cy="219.7715"
+		 r="150.7063"
+		 fx="218.9404"
+		 fy="219.7715"
could be rounded to integers and
> stroke-opacity:0.986014;stroke-width:0.981612;
we could just use "1" (and drop stroke-opacity altogether since 1 is the default)
+		 x1="0.000000"
+		 y1="0.000000"
+		 x2="1.000000"
+		 y2="0.000000"
These are the defaults so they could be removed.
I think all the SVG "id" attributes could be removed.
+		   gradientTransform="translate(-6.10265e-09,4.28376e-08)"
I don't think we care about translation by less than one millionth of a pixel.
+		   gradientTransform="matrix(1,-1.28068e-23,-1.04254e-22,1,-2.17915e-08,-2.49062e-08)"
Likewise, this is infinitesimally different from the identity matrix so we should drop it.
+		   rx="0.000000"
+		   ry="0.000000" />
These are defaults, drop them
+		 transform="matrix(-3.82182e-18,-1,1,-3.82182e-18,-21.9997,520.689)">
The nearly-zero components of this matrix should be zeroed.
Comment 98 Damon Sicore (:damons) 2008-06-10 16:23:00 PDT
wanted1.9.1+, P1.
Comment 99 Sylvain Pasche 2008-06-10 16:47:57 PDT
(In reply to comment #96)
> Needless to say, the big missing thing here is tests.

WebKit tests may be of interest: http://trac.webkit.org/browser/trunk/LayoutTests/media

Stanford Security team have developed an extension for running these tests http://crypto.stanford.edu/websec/cross-testing/. I'm working on an enhanced version to automate the test loading and verifying.
Comment 100 cajbir (:cajbir) 2008-07-01 23:33:20 PDT
Created attachment 327750 [details] [diff] [review]
version 14 of the video element patch

Updated to address review comments.

--enable-video and MOZ_VIDEO have been changed to --enable-media and MOZ_MEDIA respectively. When the first backend is landed this will be changed to automatically --enable-media when that backend is enabled.

The changes requested by Apple for cues doesn't seem to be going ahead based on Hixies reply to the whatwg list. So for now I'll keep the Cues JS api in there.

playbackRate now works, including defaultPlaybackRate.

Currently backends are hardcoded and can't be implemented by extensions. I can look at changing that in a future bug/patch. It interacts with how the selection of the codec is done and what to do if multiple backends support the same codec, etc so needs some more thought.

I'll look into the issue of adding/removing source elements dynamically and changing source element attributes.

Tests are coming in a separate patch.

mRGBLock has been changed to mVideoUpdateLock since it needs to be obtained when the video data is being updated (the raw RGB data, width or height).
Comment 101 qengli 2008-07-02 19:55:14 PDT
Hi, Double
  I have applied patch14 in xulrunner source code, but it could not work, the video element didn't load the source video, so the display is just a black area. I track the program and found the mDecoder in nsHTMLVideoElement is null, even the nsVideoDecoder::nsVideoDecoder() is not called. 
here is my snippet testing xul code:
	<html:video 
		src="file://e:/media/interview.ogg" 
		id='media1' 
		style="width: 320; height: 240;"/>
the same code can run with firefox download from http://www.double.co.nz/video_test/firefox-3.0pre-video-ogg.en-US.win32.zip
Comment 102 cajbir (:cajbir) 2008-07-02 20:03:29 PDT
gengli, this patch does not include a 'back end' to decode video. For that you need the patch from one of the bug's for the backend implementations (see the 'blocks' portion of this bug for the list).

Currently though they are all out of date with respect to patch14. The gstreamer backend patch will be updated shortly to work, and the ogg one won't be too far behind.
Comment 103 Johnny Stenback (:jst, jst@mozilla.com) 2008-07-08 18:01:01 PDT
Comment on attachment 327750 [details] [diff] [review]
version 14 of the video element patch

- In nsDOMProgressEvent::GetLengthComputable(PRBool* aLengthComputable):

+  if (aLengthComputable)
+    *aLengthComputable = mLengthComputable;

This null check can be left out. Anyone calling this method from C++ and passes in a null pointer deserves the (non-exploitable) crash they'll trigger. Same goes for all DOM interface implementations (i.e. other methods in the same file etc). Historically we've recommended adding the checks so the tree is full of them, but it's really not worth the extra code.

- In nsDOMProgressEvent::GetLoaded(PRUint32* aLoaded):

Sorry if this has already been discussed and decided against, but shouldn't this really be a 64-bit value rather than a 32-bit value (here and in other places where we're dealing with file size)? It's not that far stretched IMO to expect this to be used with 4GB+ files down the road, and limiting the size in the API here seems like a bad thing to me. Similar issues already came up with the plugin scriptability API (NPRuntime) which doesn't support integers larger than 32-bits.

I hear Smaug's been all over this issue in the spec world, so he should be in the loop on deciding one way or another here.

I'll defer to roc for taking a close look at the gfx/layout changes in this patch.

r=jst with that taken into account.
Comment 104 cajbir (:cajbir) 2008-07-08 22:01:51 PDT
Created attachment 328623 [details] [diff] [review]
Address review comments by jst

This patch should be applied on top of the existing version 14 patch. It addresses the review comments by jst in comment 103. I put it in a separate patch to make it easy to see the changes.

Only change is to remove the null checks. Depending on the outcome of the ProgressEvent spec queries I'll change that to an int64 in a later bug.
Comment 105 cajbir (:cajbir) 2008-07-08 22:09:28 PDT
Created attachment 328624 [details] [diff] [review]
Some tests for the video element patch

This adds some mochitests for the video element patch. It currently only tests the basics of some of the video element attributes. Many of the events, etc don't work or make sense without a backend so I'll add those with the first backend to be ready.
Comment 106 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-07-09 00:36:20 PDT
(In reply to comment #97)
> How does mRGB get reallocated when the video frame size changes? It doesn't
> seem to.

Did this comment get addressed? I don't see how.

> Should we make nsIDOMHTMLVoidCallback be implementable using a raw JS
> function?
> It's trivial, see
> http://mxr.mozilla.org/mozilla-central/source/dom/public/idl/events/nsIDOMEventListener.idl#52
> Maybe that needs to be proposed as an extension to the spec; actually I'm
> surprised it's not already in the spec.

What's the story here?
Comment 107 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-07-09 01:16:17 PDT
Comment on attachment 328623 [details] [diff] [review]
Address review comments by jst

Looks good to me and since they're trivial, I'll usurp jst's review.
Comment 108 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-07-09 01:21:58 PDT
Comment on attachment 327750 [details] [diff] [review]
version 14 of the video element patch

I fixed the size issue myself (Chris, check my change to SetRGBData when I check in). The other issue is a spec issue really.
Comment 109 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-07-09 01:24:01 PDT
I checked in the patch. I'll deal with the tests shortly.
Comment 110 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-07-09 02:47:15 PDT
The patch needed one fix ... some stuff in nsPresShell.cpp needed #ifdef MOZ_MEDIA.

Which made me realize that since MOZ_MEDIA isn't currently defined, we can't check these tests in yet.

Should we enable MOZ_MEDIA on trunk? Or do we need to wait for a backend to land?
Comment 111 ojab 2008-07-09 08:40:11 PDT
Will there be native themes for video/audio controls (gtk-media-* icons in nix, for example)?
Comment 112 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-07-09 11:04:17 PDT
That would probably be pretty easy to do, but no-one's signed up to do it.
Comment 113 Dão Gottwald [:dao] 2008-07-09 11:14:05 PDT
Would be easier if it were pixel images rather than SVG.

By the way, did somebody review videocontrols.xml? The indentions are messed up, and more importantly the labels aren't localized.
Comment 114 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-07-09 13:47:33 PDT
That would be me. Sorry about that, I guess we should have got a toolkit person to review it. Please do the honours, I'm sure Chris will fix it in a timely manner :-).
Comment 115 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-07-10 21:37:56 PDT
(In reply to comment #110)
> Should we enable MOZ_MEDIA on trunk? Or do we need to wait for a backend to
> land?

Of course Chris already mentioned that once we land one or more backends, we'll make --enable-media automatically happen as long as one or more backends are enabled.
Comment 116 Biju 2008-08-05 23:28:00 PDT
Bug 336164, Bug 422538, Bug 422540, Bug 435298, Bug 435339 are not under component Video/Audio
Comment 117 Eric Shepherd [:sheppy] 2008-10-09 15:38:05 PDT
Working on docs for audio and video, and have a couple of questions:

1. Does the server have to send files with correct MIME types for these to work?

2. What codecs/formats are supported?  Do we support QuickTime and/or WMV as appropriate by platform?
Comment 118 cajbir (:cajbir) 2008-10-09 15:54:25 PDT
1) Yes the server has to send files with correct MIME types. Otherwise <source> elements can't be picked based on the type.
2) Vorbis for audio and Theora for video in an Ogg container. Currently we don't support any other type. Other backends are in progress at various states of completion.
Comment 119 Eric Shepherd [:sheppy] 2008-10-09 16:24:11 PDT
I've now documented video and audio here:

http://developer.mozilla.org/En/HTML/Element/Video
http://developer.mozilla.org/En/HTML/Element/Audio

Please feel free to have a look and tell me if I have anything badly wrong; in particular, I'm not sure we implement all the attributes laid out in the spec, so if we don't, please let me know.
Comment 120 cajbir (:cajbir) 2008-10-09 17:30:08 PDT
The following attributes are not implemented for Video/Audio, but mentioned in the docs:

start
poster
playcount
loopstart
loopend
end

There are bugs for these. controls is not implemented for audio. This was raised in the bug related to implementing controls.

Time offsets are currently float values, representing number of seconds to offset. This part of the HTML 5 spec has not yet been completed.
Comment 121 Eric Shepherd [:sheppy] 2008-10-09 22:59:06 PDT
I've revised the documentation based on comment #120; please let me know if there are any issues remaining.  Thanks!
Comment 122 cajbir (:cajbir) 2008-10-10 00:23:54 PDT
Looks good! In the examples for the audio section the autoplay attribute is defined differently in one example. Some have autoplay="true" and another has autoplay="1". Maybe it should be autoplay="autoplay" or just autoplay:

<audio autoplay src='....'>

It seems that some people are thinking that autoplay="false" means don't autoplay, which is incorrect. Ditto with controls. Using 'true' in the examples might propogate this.
Comment 123 Eric Shepherd [:sheppy] 2008-10-10 07:33:41 PDT
Updated the examples to just say "autoplay" instead of assigning a value.
Comment 124 Eric Shepherd [:sheppy] 2008-10-23 15:18:40 PDT
There's now additional documentation here, which provides some extra information about the events that can be sent and how to work with them, among other things.  Please feel free to twiddle if anything's inaccurate, or make suggestions for other details that might be good to add:

https://developer.mozilla.org/En/Using_audio_and_video_in_Firefox
Comment 125 Chris Hubick 2008-10-23 16:06:12 PDT
Is there a bug or documentation regarding the <source> child element of <video> ( http://www.whatwg.org/specs/web-apps/current-work/#the-source-element )?

I switched my page's Theora <video> element from using a "src" attribute to using a <source> child element, and it broke.  I couldn't find any examples using the <source> element to see if I was doing something wrong.

Thanks.
Comment 126 cajbir (:cajbir) 2008-10-23 16:13:04 PDT
<source> is bug 449363. There is limited support there. It will pick the first source element that has a type of exactly "video/ogg" or "application/ogg". So:

<video>
  <source src="foo.ogg" type="video/ogg"></source>
  <source src' foo.mov" type="..."></source>
</video>

This worked last time I tried it. It will play the .ogg in Firefox and the .mov in Safari.
Comment 127 Aaron Train [:aaronmt] 2008-11-18 09:55:04 PST
Hello,

I was wondering as to which aspects of this patch have been tested so far according to the test plan found at: https://wiki.mozilla.org/QA/Firefox3.1/Video_And_Audio_TestPlan

I am willing to jump in to contribute in the testing for this patch (i.e., unit tests).

Thanks,
Aaron
Comment 128 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2009-03-04 14:28:10 PST
I'm closing this. The basic implementation work here is done. There are a lot of followup bugs and enhancements, but we don't need a tracking bug for those, since we have the Video/Audio component.
Comment 129 Tony Chung [:tchung] 2009-03-04 15:02:53 PST
Verified.  
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090304 Minefield/3.2a1pre
Comment 130 Fernando Hartmann 2009-04-14 05:35:47 PDT
I have some question about the the video tag (I hope this is the right place) I'm running FF 3.1 b3 and noticed some odd behavior,Today I assessed http://www.rumblingedge.com/2009/04/14/nus-cs3108-final-presentations-on-video/
and realized that the use of my 1 MBps ADSL link goes to 100%, after some research discovered that was the videos tags in the page. There wasn't any information on the videos that they are in download process.
Is this behavior correct ? 
It seams to me that the correct was the download start begin only when I ask to.
With this behavior FF can easily eat all the bandwidth  of my office and the user is not alerted about this.
Comment 131 cajbir (:cajbir) 2009-04-14 05:57:53 PDT
See bug 464272 and bug 479863. They cover the behaviour you are seeing and what is being done about it.

Note You need to log in before you can comment on or make changes to this bug.