Closed Bug 490705 Opened 15 years ago Closed 14 years ago

Support Audio Data API: Get, Manipulate, Play & Save

Categories

(Core :: Audio/Video, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla2.0b5

People

(Reporter: dale, Assigned: humph)

References

()

Details

(Keywords: dev-doc-complete)

Attachments

(1 file, 25 obsolete files)

123.88 KB, patch
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2a1pre) Gecko/20090428 Minefield/3.6a1pre
Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2a1pre) Gecko/20090428 Minefield/3.6a1pre

It would be nice to have a basic getSampleData putSampleData type interface for audio ie audioCanvus.getSampleData(videoElement|audioElement );

This feature has been mentioned a few times on lists such as whatwg.http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-July/015335.html
Ian who is a lead on the html5 spec work mentioned "Experimental implementations would go a long way towards demonstrating what is possible and what should be specified"

I thought I would add a bug to help track any efforts for experimental implementations in firefox.

Reproducible: Always
Component: General → Video/Audio
Product: Firefox → Core
QA Contact: general → video.audio
I would not create any kind of audio canvas, but just add additional APIs to <audio> and <video>.  getAudioSampleData/putAudioSampleData would live nicely on <audio> (putSampleData could have a side effect of uncompressing any compressed audio stream in memory so that the put can succeed, for example).
What sort of use cases are there? For put/get audio sample data are you expecting some sort of virtual buffer that contains the data and then can be played?

You can do this now for the 'put' by building data URL's and using WAV. You can't get access to existing audio data from an audio element though.

Is there the need to manipulate audio data as it plays? Maybe registering a callback on an audio element that provides the sound data to be played, letting the callback change that, etc?
(In reply to comment #2)
> What sort of use cases are there? For put/get audio sample data are you
> expecting some sort of virtual buffer that contains the data and then can be
> played?
> 
> You can do this now for the 'put' by building data URL's and using WAV. You
> can't get access to existing audio data from an audio element though.

Yeah, the get is key, and building a WAV file as a data url is a little ugly.. especially if you have a lot of data.  Think of the equivalent of doing an image filter in JS over a canvas -- running an audio filter over a loaded audio chunk.

Or creating a new audio chunk of N seconds, and grabbing chunks from different audio sources, running an effect over them, then putting them.. and then playing the resulting sound.

> Is there the need to manipulate audio data as it plays? Maybe registering a
> callback on an audio element that provides the sound data to be played, letting
> the callback change that, etc?

That could be interesting for sure, but I think it's two different use cases.  Having that capability would simplify some of the uses of the above, at the expense of CPU per play, as opposed to one chunk of CPU work, and maybe a memory expense.
Status: UNCONFIRMED → NEW
Ever confirmed: true
By way of introduction, I am the original author of the 'audio canvas' idea, as proposed on the WHATWG list, and have been invited by Chris Double to contribute to the present discussion.

The whole thread started with this

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-July/015322.html

and it says "...making <audio> more like <canvas>...", i.e. I agree with Vladimir no new element is needed (audio canvas is just a catchphrase).

Some use cases were sketched here by myself, Vladimir and others:

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-July/015335.html
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-July/015378.html

"For the sketched use case - in-browser audio editor -, functions on 
sample regions from {cut/add silence/amplify/fade} would be nice [...]"

But maybe this excerpt from my email to Ian Hickson gives a clearer picture of a use case? Note that it, too, mentions the data URL approach:

"My use case would be an audio editor in a web browser.

Suppose you want to determine start and end of a suitable piece of
speech under auditory control. You get e.g. a WAV audio file from your
webserver, then draw an oscillogram time-domain visualization of the
audio in a canvas object and place a semitransparent DIV over it to
represent the selected time interval. Now when you want to listen to
your selection to get it right, you need to select a subinterval of the
entire WAV for playback.

(I did a prototype of this for Safari, which worked after a lot of
hacking, since 'end' cannot be used currently as it is buggy and adds
0.4s extra sound, cf. https://bugs.webkit.org/show_bug.cgi?id=19305  - I
was forced to use data URI mangling on a known simple audio format,
16-bit PCM WAV.)

For speech and music editing, this would have to be sample-accurate. [...]

I guess my general point is there is much more to audio than just
listening to music - precluding the whole professional audio world from
web applications would just force them to use proprietory plugins. My
plea is for that world not to be completely forgotten in favor of the
music-listening crowd.

In that regard, the new Flash 10 plugin has dynamic sound
editing/generation capabilities, see
http://www.adobe.com/devnet/flash/articles/dynamic_sound_generation.html.
However, I would rather go with open web standards, especially since
Javascript is getting so fast these days.
"

A summary of the discussion and more use cases for dynamic audio in the web browser is here:
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-October/016719.html

I think data URI manipulation is bad for audio because strings are immutable in JS and audio can be big, hence creating lots of garbage to be collected. One needs more efficient builtin primitives for random access to audio parts and in-place modification.

And I agree with Vladimir that manipulation of audio _while_ it plays can be put on the backburner, perhaps for a long time.

Finally, there is an open-source project without any code yet:
http://code.google.com/p/web-audio-editor/
First thank you Markus for your detailed response and overview of in-browser audio manipulation. 

'Audio Canvas' is confusing term especially if you can (as Vladimir suggests) just add an API to manipulate the audio element directly I have renamed bug:

"Support Audio Data API: Get, Manipulate, Play & Save"

The new title reflects what I think we want. We want to be able to get audio data from audio or video elements, manipulate the PCM data, play it back and or output that PCM data to a server. Ideally it optionally support compression similar to canvas saveAsJpeg and saveAsPNG. ie we should be able to output the audio as ogg vorbis.

Once we support these temporal data manipulations I wonder how that metaphor is extended to video. In terms of sequencing back image frames into a temporal video streams. 

I should mention j^ has been working on a JS driven video sequencer that uses drawWindow functionality from an extension to grab frames of video at a given DOM state, it then renders the frames together into an ogg theora video. Maybe that use case can also be considered as part of these temporal data get/set operations for audio/video. *this is a separate bug though* 

Architecturally what is the plan moving forward? Will we continue to do one-off solutions for all these features or package some type of media framework? Ie cross platform Gstreamer is maturing...( are there any other frameworks out there worth considering? is Gstreamer considerable? ) 

A Gstreamer back-end could explode open the media feature set and provide a platform for quickly moving forward on everything from in-browser video conferencing to video in OpenGL.
Summary: Support Audio Canvas with <audio> or <video> as Input → Support Audio Data API: Get, Manipulate, Play & Save
Hey guys, I am SOOOOOOO glad people are talking about this as I feel the same way. I've been writing music and coding audio apps for quite some years and was honestly shocked when I read the sparse W3C outline. It makes sense, that it was sparse as these are new territories for the web, and making these things work the same for everyone is no small task and obviously it is more important to get the basics solid before tackling the more artistic/technical stuff.

BUFFER ACCESS: I'm sure a lot of this code is already in place as you are streaming the audio and creating buffers already. Hopefully this is being done at a low enough level where you can allow Javascript to get & put data to the buffer?

I would like nothing more than to start making comet based music software for the browser that let people from different parts of the world synthesize music together on one page. 

One issue when moving in this area is timers. You have to make sure you're using really good timers or things will start to unravel. Does anyone know what is being used at present? I remember from one of John Resig's posts that the Javascript timers a very unreliable, which is not "such" a big issue as you would just change the buffer size to compensate (adding latency) however: at the C++ level, you will need better timers than the ones the OS gives you as standard (at least that is true on windows systems, I've never developed for macs).

I wonder what the barrier is for getting my hands the on the source for Firefox and experimenting?
There is no barrier other than the ability to perform a Google search.
Hah!  Nice one there, Robert.

So hey, how do we kick this thing off?  What do we need to do real audio editing?  And at what level?  Mixing clips or exposing raw PCM data to JS?  (csound in the browser, anyone?)
Maybe expose nsAudioStream to JavaScript so JS can create audio streams, write to it, etc?

It'd be worthwhile looking at existing Flash solutions in this space I suspect.
Personally, I think we have much more important things to work on right now.
Robert, I understand that this isn't in the critical path, but if we can come-up with a reasonable api, I might give this a shot to try and get it in.  Thomas had some ideas in bug 532492, which I'll ask him to repeat here regarding how Flash does this now.
The following is copied from this bug: 
https://bugzilla.mozilla.org/show_bug.cgi?id=532492

The bug ( more accurately, the missing feature ) is that the Firefox browser
does not offer the ability to analyze the spectrum data contained in an audio
tag.

I currently have this ability to analyze sound data with Flash, but I don't
want to use flash.  This holds me back from doing the work that I love to do in
Javascript and HTML5, and until this feature is implemented I'll have to use
flash.

Proposed Javascript API:

A proposed API could probably work similar to Actionscript 3's native
SoundMixer API ( more information here http://livedocs.adobe.com/flash/9.0/ActionScriptLangRefV3/flash/media/SoundMixer.html), though it could also be implemented differently as well.

The main method of the Actionscript SoundMixer is computeSpectrum(), which returns a byteArray of the values of both the left and right channels, with or without a Fourier Transform.  

In most uses, this method is called every frame ( 30 fps ) so performance is a
consideration to keep in mind.  Usually this data is enough to create an
engaging sound and visual experience.  With Javascript and the Processing.js library one can take that data and do something like this:
http://modern-carpentry.com/workshop/html5/jsmusic

Another part of the API would be the ability to programmatically set the values
of the sound channels, or manipulate them.  This would enable developers not
only to remix other sounds on-the-fly, but it also could have accessibility
implications, such as JavaScript text-to-speech technology.
I've spent some time on this, and have a working prototype--I am working on my blog vs. this bug while I learn so as to not spam people here (http://vocamus.net/dave/?cat=25).  One question that I'm struggling with and could use some guidance on is the right way to expose the array of data to content.  I've been studying various approaches to this, and currently I have made an nsIDOMAudioData object that provides .length and .item(n), and wraps an nsTArray.  I can't find a good way to simply expose an array as an attribute of the event object, but this may just be my own ignorance.  If some of you with more knowledge about how to expose arrays to the DOM could give me a hint, it would help a lot.

I'll refine my patch some more then post it here once it's more reviewable.
Vlad, can we reuse the WebGL data array stuff?
Absolutely, and should -- it's in bug 532774 and just needs one more review from brendan to land.  Associated tracing and other pieces will come shortly after.
This isn't ready for review, but a number of people have asked for it (I did this against mozilla-central rev 683dfdc4adf0).  You can see a cool demo using it here: 

http://vimeo.com/8525101

which is a video of the patch being run against this:

http://weare.buildingsky.net/processing/dft.js/audio.html

I'll keep experimenting and clean this up before doing a proper patch.  I'm hoping to rewrite to use Vlad's WebGL arrays as suggested, once they land.
Here's a better patch, which adds a native fft, and the audio event now returns the raw framebuffer as well as the calculated spectrum.  Still not ready for review (this is totally unoptimized and doesn't have proper error handling yet) but quite usable and good for experiments.  I'm still just getting things to work as we do more tests.

As previously, this will apply cleanly against rev 683dfdc4adf0.  I'll update when I switch over to webgl arrays.
Attachment #419908 - Attachment is obsolete: true
Is there a spec for the additions somewhere?
Not yet--honestly we've been learning and trying things as we go, knowing that this will need more input and guidance going forward.  Do you have suggestions about where this should be written up?  After a couple more iterations and some perf tests, I'll try to get this ready for review, and as I do write something.

For now, this patch adds a new event to audio/video named audiowritten (the name needs to change, I realize).  It can be used like so:

<audio src="song.ogg" onaudiowritten="audioWritten(e);" />
<script>
function audioWritten(e) {
  // The framebuffer will be 4096 or more float elements in length
  var fb = e.mozFrameBuffer;

  // The spectrum is pre-calculated, and will be 1024 float elements in length
  var s = e.mozSpectrum;

  // Elements in each can be accessed as follows:
  console.log("FrameBuffer Length=" + fb.length + " [" + fb.item(0) + ", " + fb.item(1) + ", " + fb.item(2) + ", " + fb.item(3) + ", ...]");
  console.log("Spectrum Length=" + s.length + " [" + s.item(0) + ", " + s.item(1) + ", " + s.item(2) + ", " + s.item(3) + ", ...]");
}
</script>

You can see a better example of it being used here:

http://weare.buildingsky.net/processing/dft.js/audio.html

I have not worked on grabbing all the audio yet, and will need to look into this more.  Ideas and suggestions are welcome.
Hmm.. that seems like a pretty odd API in a lot of ways.  The audio element already has events on it (I believe) to find out how much of the audio is available etc; given that, why not take a <canvas> like approach?  That is, something like:

data = a.getAudioData(start, end);
a.putAudioData(data, start, end);

with an exception being thrown if start..end is out of range of what's already loaded.  Also perhaps an a.setLength(x) to set an explicit length to expand/contract the buffer.

I also don't think the spectrum should be part of the API; if someone needs the spectrum, they can calculate it in JS using the data they obtain.
I've looked at all of https://developer.mozilla.org/En/Using_audio_and_video_in_Firefox#Media_events and didn't see what I wanted (perhaps I'm missing it).  The thinking was to make it possible to have visualization work in sync with the audio as it's being played.  What you describe would be awesome too, but I'm not sure how you'll then sync that such that what's happening visually is timed to the sound events.
(In reply to comment #21)

> I also don't think the spectrum should be part of the API; if someone needs the
> spectrum, they can calculate it in JS using the data they obtain.

No, that would very much defeat the foreseen application of realtime visualizations that needs a spectrum. You would have to demonstrate that a FFT written in JS is competitive in speed with a native implementation, which seems rather unlikely (slow JS bitops, array access, etc.). 

IMHO Flash 10 introduced native spectrum calculation for exactly that reason (see my comment #4 for a link).

Besides music visualization, it can find application in audio spectrograms (http://en.wikipedia.org/wiki/Spectrogram) - wouldn't it be cool to do this in a browser?!
(In reply to comment #23)
> (In reply to comment #21)
> 
> > I also don't think the spectrum should be part of the API; if someone needs the
> > spectrum, they can calculate it in JS using the data they obtain.
> 
> No, that would very much defeat the foreseen application of realtime
> visualizations that needs a spectrum. You would have to demonstrate that a FFT
> written in JS is competitive in speed with a native implementation, which seems
> rather unlikely (slow JS bitops, array access, etc.).

I'd actually turn that burden of proof around on you -- you should demonstrate that FFT written in JS is too slow, taking advantage of all the optimizations on offer (e.g. typed arrays, landing soon).  If it is too slow, then we should figure out why and fix it; math banging like that should be blazingly fast.

> IMHO Flash 10 introduced native spectrum calculation for exactly that reason
> (see my comment #4 for a link).
> 
> Besides music visualization, it can find application in audio spectrograms
> (http://en.wikipedia.org/wiki/Spectrogram) - wouldn't it be cool to do this in
> a browser?!

Sure, there are lots of cool things that we could do here, but we can't and shouldn't add them all to the browser native code.  As much as we can do in JS we should, because then content isn't dependent upon the browser providing functionality or fixes.
> I'd actually turn that burden of proof around on you -- you should demonstrate
> that FFT written in JS is too slow, taking advantage of all the optimizations
> on offer (e.g. typed arrays, landing soon).  If it is too slow, then we should
> figure out why and fix it; math banging like that should be blazingly fast.

Yeah, totally agree: we need good numbers here, and we are working on that.  We have pure js fft and now pure c++ fft, so we'll do some tests and let the data tell us the tale.  Note: the video I posted above that Corban made is pure js fft, so it can be done.  More soon...
Vlad asked before if we had documented this, and now we have:

https://wiki.mozilla.org/Audio_Data_API

We submitted this to the dev track for WWW2010 in order to get some more discussion going around the idea of exposing audio data.  I'm also posting this here for the same reason.  We'd love to figure out the right path forward, and need help to get there.
In the current draft the user species a number of channels. There is an example piece of code that does audioOutput.mozSetup(2, 44100, 1); This may seem alright at first, but turns out to be a bad idea. We recently went trough this while designing a piece of desktop audio software.

The number of channels is irrelevant because "2 channels" does not imply (left, right). More importantly "3 channels" could mean either (left, right, subwoofer) or an ambisonic setup. The first suggestion has a separate channel for low tones that are to be played back with a dedicated bass speaker, while ambisonic systems use the same amount of channels to produce a 3-dimensional sound experience. In the different cases the data sent through different channels is completely different. Also the playback hardware and speaker placement are different.

If support for different playback environments is desired, we should replace the channel number with a string naming a playback setup standard. This could be e.g. "mono", "stereo", "ambisonic", "dolby5.1" or such. After this change the original example would look like.
audioOutput.mozSetup("stereo", 44100, 1);

If we how ever give up the idea that browsers should support more complicated environments than mono and stereo. We could just decide that it is always stereo. Most sounds are stereo these days so mixing stereo down for hardware with only one speaker is probably implemented everywhere. Also one can simple compute a mono sound buffer and then refer to the same buffer for both channels. So this does not add to the amount of computations or memory consumption in the application.
I forgot to post this earlier, and am just putting it here for completeness sake.  This is the one we've built all our demos with, since it adds the audio write code.  I haven't touched the WebGL array issue, and think I'll leave it until I get some more feedback and direction for where to take this.
Attachment #420577 - Attachment is obsolete: true
My two cents on this discussion:

I've tinkered a lot with Flash's ability to generate sound, including synthesis, sampling, mixing, etc., and how to create HTML/JS interactive objects on top of these features, although on currently closed-source projects. Nevertheless, I've tried to explore some of the limits of what features should be exposed by a browser in what respects to audio generation. You can find this experiment at http://github.com/ruidlopes/SoundCanvas. It uses a barebones Flash object that dispatches "generate" events to the browser's Javascript layer. While it works, the lack of threading kills the browsers running the experiment. But one's able to hear a bit of a Sine wave in the demo, though :)

Based on my experience, I have little to add to the "reading" part of the Audio Data API - which does make sense to me as it's currently specified. However, I do have some fundamental concerns with the current state of the proposed "writing" part of the API (I'll call this the "writing API", from now on).

As I believe that it's the same opinion as you guys, my gut feeling is that the greatest potential for the writing API concerns the development of synthesisers, sound studios, virtual musical instruments, etc. Therefore, three crucial keystones are fundamentally bound to this: flexibility, accuracy, and performance. IMHO, none of them can be left out in high-quality audio applications.

I believe in Javascript's ability to build complex stuff on top of simple APIs provided by browsers. Therefore, the greatest flexibility lays at providing the simplest thing to write: an array of floats. While this has been taken care by the writing API through the mozWriteAudio() method, it strikes as awkward the usage of setInterval() to keep the pace of generating audio data. This way of doing things breaks both accuracy and perfomance. First, it's a known fact that Javascript timers are less than accurate. Thus, calling setInterval(writeData, 10) will trigger writeData() with some delay. This will have a profound negative consequence on creating virtual musical instruments, which are dependent on "real time". Second, the performance hit of Javascript timers will put an unnecessary computational burden on the browser.

My take on the writing API is to make it more close to what's been specified for Flash 10: to add a "generate" event to the Audio object, which can be bound at application-level. Whenever it's required, the writing API triggers the generate event, which should be akin to Flash's SampleDataEvent: an array to which audio data should be written to, and a position in the audio stream for accuracy purposes. This way there's no need for javascript timers or complicated stuff that disable the potential of creating complex audio applications in the browser. A quick hack on how this could be specified/used (pardon me if it has errors, it's coded from the brain, straight to the keyboard):

var audio = new Audio();
audio.addEventListener("generate", function(e) {
  for (var i = 0; i < e.data.length / 2; i++) {
    var f = Math.sin((i + e.position) / Math.PI / 4);
    e.data[i*2] = f; // left channel
    e.data[i*2+1] = f; // right channel
  }
}, true);

(Please bare in mind that this simple example doesn't cope with 5.1 channels, etc., but the idea still applies.) The browser's writing API would execute the bounded event and, internally, process the e.data array efficiently and feed it to the audio driver. The e.data array is created internally by the browser and fixed with a given buffer length provided, e.g., at mozSetup().

IMHO, this writing API is easily comprehended by application developers and, by posing no limits on how sound data is created, anyone can build synthesisers, instruments, audio modellers, etc., thus leveraging the potential of having rich, complex audio-based Web applications.

Finally, the only thing that I (intentionally) left out regarding sound processing concerns mixing. It's a highly sought-after feature on audio writing APIs, constantly being used by audio applications. We might have something to say on this too, since mixing audio buffers is really computationally intensive... We might not want to perform it at Javascript level. But I'll leave this out of the discussion, for now :)

Thanks to everyone taking part in this quest... I really really would like to see this pushed onto HTML5-ish fluff. Thoughts? :)
I agree with Rui Lopes, JavaScript timers are not accurate enough. The Chrome team have invested some time in to high-resolution timers. The whole JavaScript-to-C++ poly-resolution-mapping paradigm is something that music software has had to deal with since the invention of MIDI.

A MIDI control has 128 possible values. Imagine I had my volume control set to 0, and had a sound that I was mixing in over the duration of 10 seconds. If I was to mix this volume to a level of 10, how often would the volume increase?

In a poorly designed system, the volume would increase once every second. This would lead to artifacts where the values stepped up. But the audio data is a float type and can contain many different values, meaning the most accurate way to represent changes in value would be to use linear easing.

So to riff on a few things Rui Lopes says: you could have a C++ layer that was high-resolution, that would make the appropriate calculations for the JavaScript. It does not matter so much what is handed to the JavaScript, as there will always be some kind of Window lag, but pushing data from JavaScript to C++ will be important to get right. I would guess there should be some handlers for common functions such as generating blocks of bytes with a certain kind of wave signature.

I also totally agree with the suggestion that this stuff gets really CPU heavy, if you work with Professional audio applications like Cubase, Protools, Nuendo etc, you'll know what we're talking about. The more audio processing you add, the closer you are to buffer limit and to hearing all those clicks and pops. So putting it all in JavaScript would eventually prohibit application development.

That being said, there is no reason you can't do all these things in JavaScript too, to some extent, but there will be a much lower ceiling.
(In reply to comment #30)
> So to riff on a few things Rui Lopes says: you could have a C++ layer that was
> high-resolution, that would make the appropriate calculations for the
> JavaScript.
Can you be more precise about what you mean by "high resolution?" The three major platforms will give you 1ms resolution (with a few exceptions depending on your software/hardware). On Chrome, if you're on battery, you will get poor resolution (15ms) on Windows IIRC.
Are you sure you want high-resolution timers?  Maybe what you really want is high-resolution scheduling with the ability to base actions based on where the audio clock is?  (This is what we do with video playback - the sync signal is based on the plaback time, not the audio input time or some other clock.)
(In reply to comment #31)
> (In reply to comment #30)
> > So to riff on a few things Rui Lopes says: you could have a C++ layer that was
> > high-resolution, that would make the appropriate calculations for the
> > JavaScript.
> Can you be more precise about what you mean by "high resolution?" The three
> major platforms will give you 1ms resolution (with a few exceptions depending
> on your software/hardware). On Chrome, if you're on battery, you will get poor
> resolution (15ms) on Windows IIRC.

I guess what I am getting at is that for music to sound good (especially if you are using sounds like long strings), you have to be careful not to drift too much or it will sound like someone is dragging their finger across the vinyl record, slowing the belt drive down: and its' amazing how well the ear picks up on those little imperfections.

So with regards to high-resolution timers, I would say that the point would be to make such that the 1ms that just passed was as close in shakes-of-a-quartz crystal in time to the 1ms that preceded it.

But as Chris Blizzard pointed out, scheduling with the existing audio clock is what you really want: regardless of how high a resolution you can get. You would then (i assume) create some kind of memory buffer in C++ that was pre-loading all the audio so everything appeared to be seamless.

It will be interesting to test where the sweet spot when scheduling with the audio clock and compare that to the buffer sizes of other audio applications.
Is someone going to pick up my rant about "2 channels" vs "stereo". I get a bad feeling this part is no-ones actual point of interest, and it might end up going to the final version without change, unless someone keeps constantly pointing it out. Is there something I should do, other than keep posting here?
(In reply to comment #27)
> In the current draft the user species a number of channels. There is an example
> piece of code that does audioOutput.mozSetup(2, 44100, 1); This may seem
> alright at first, but turns out to be a bad idea. We recently went trough this
> while designing a piece of desktop audio software.

I second this (also from past experience developing desktop audio software). Specifying the number of channels isn't enough - you really need to know how those channels are mapped (ie, placement and hardware). Without exposing that, developers wanting to use more than 2 channels will need to guess and make assumptions about the audio setup - which will lead to some strange results when they get it wrong.
Sorry, I forgot to comment on that. It does sound like a great idea, actually, probably more "necessary" than "great". Any decent audio software will allow you to specify the route to specific hardware devices, without that I could definitely a) see a lot of people getting upset, b) lots of creativity missed, and c) back-tracking and providing hardware mapping after the fact could be very messy for everyone involved.

Some kind of handling for people who do not care about which hardware devices are being mapped might be nice?
Some great ideas, and in reply to comment 34, don't worry, nothing is getting jammed through here.  It's not clear if any of this will be accepted, or if it does, what it will look like.  I think we need to try some more things, and get a better sense of what this should look like.  Working demos talk.

To that end, can I invite anyone in this bug who is an audio/html5 hacker to please join us as we get started on another demo blitz.  We submitted a talk to http://www2010.org/www/ and it was accepted.  We're starting to brainstorm and plan some more demos here:

https://wiki.mozilla.org/User:David.humphrey/DemoIdeas

Our goal is to try and get more people in the browser/html5 space excited about this idea, so every little bit helps.  Get in touch if you're interested, and please keep throwing ideas in here so we can get something reviewable/landable.
David, I'll be attending WWW2010, so I'll meet you there for some brainstorms.
Updated patch which backports some of the secondary buffering that's on trunk now (I need to update the whole thing to trunk, but the code has changed a lot and haven't had time).  We just presented this work at www2010, and got a really positive response.  Here's a post with links and videos of new demos.  The audio write stuff has become a lot more refined:

http://vocamus.net/dave/?p=1074

I'm not sure what the proper next steps are here, in terms of getting this into shape for review.  I think I probably need some advice and a bit of someone's time to figure things out.
Attachment #431280 - Attachment is obsolete: true
We can start that process if you think you're ready.  Here are the questions that I would ask before adding a new API to Firefox:

1. Does it work reliably cross platform?
2. Are there platform differences that will show up in the API?
3. Are there differences that we might think about exposing because there's a huge amount of benefit in doing so?
4. Does the new API have a set of tests that we can automatically run as part of our build + test process?
5. Do we have reasonable expectation that the API in question is close to what will be final?
6. Do we think that the standards process that follows will alter the scope of what these APIs cover today?

If we're comfortable with the answers, I think we can start the code review process.
Oh, another thought.  Are there parts of infrastructure here that we might be able to use for an eventual device api / reading from camera + mic input functionality?  (That's probably a question for cpearce and others.)
I'm very happy to help out with getting this reviewed, etc. but I'm not going to be have time in the next two or three weeks, sorry.  Definitely after that!
Is there a spec or other document describing the API and how it's supposed to work?
Thanks Matthew, that timing is fine.  I have some more work I want to do on this in the meantime anyway, including getting some of what Chris discusses above done.  Also, we just tried updating this to trunk, and the recent ogg backend changes in bug 531340 have killed performance for my current implementation on Mac (I suspect it's now getting flooded with DOM events due to the smaller frame sizes and change from float to ms positional timing).  I'm going to see if I can fix this on my own, but will likely need some advice.  I'll aim to have a first reviewable patch up by end of May.

Chris, see https://wiki.mozilla.org/Audio_Data_API.  I need to update that for a couple of minor changes, but it's mostly correct.
Thanks David, where do you want discussion to take place about the API? Comment on the wiki, here or on a newsgroup?
(In reply to comment #45)

I'm happy to do it wherever.  Maybe here so it's visible, and yet not something we have to setup a new list for?  If someone has a better suggestion, we're open.
As I have pointed out before mozSetup(channels, sampleRate, volume)is not good enough. In away the channels should be changed to something that describes what kind of data we are expecting or what we are planning to do with the data. However I just realized I'm not sure what the correct parameter name would be.

We could call it format, to indicate that it decides the format of input data mozSetup(format, sampleRate, volume), or we could alternatively call it model to indicate that it decides the output model mozSetup(model, sampleRate, volume). As we have no real control over the actual output model I guess we should call it format.

Another thing that is unclear is the type of the variable. We could use a string identifier or an integer. Using an integer has the risk that users may still end up thinking about the amount of channels. If we were using integers we might want to use some arbitrary numbers to make sure people don't get confused. But then we end up explaining to people why stereo is called "mode 37".

As we are working with an API on a relatively high level I thus suggest that we make the variable string, which needs to be one of a few predefined format identifiers.

So we would have...
  mozSetup(format, sampleRate, volume)

And we would call it like...
  audioOutput.mozSetup('stereo', 44100, 1);
drive-by: I still really don't like having the spectrum pre-calculated.  If noone has done any benchmarks with implementing CalculateFFT in JS, using typed arrays for the source and destination, that work really needs to happen -- if it's not much slower than native side, then doing that work should live in JS (and should be up to the app to decide whether it wants it performed or not, instead of always paying that cost).
I just went ahead and did that --
http://people.mozilla.com/~vladimir/misc/jsfft.js
http://people.mozilla.com/~vladimir/misc/nfft.cpp

(Note that the cpp impl is faster than what's in the patch -- the patch should not be using nsTArray's to store constant-sized arrays, nor AppendElement to build up constant-sized arrays; that would be a large speedup if we end up keeping the native impl.)

JS                  :  1000 FFTs: 593 ms (0.593ms per FFT)
MSVC, no opt        :  1000 ffts: 639.000000 ms
MSVC, -O1           :  1000 ffts: 530.000000 ms
MSVC, -O2           :  1000 ffts: 234.000000 ms
MSVC, -arch:SSE2 -O1:  1000 ffts: 483.000000 ms
MSVC, -arch:SSE2 -O2:  1000 ffts: 496.000000 ms

Our standard compile flag is -O1 for space, so a JS impl is 2x slower (though still half a millisecond on my laptop).  I'm sure we can speed it up significantly, and there may well be a better way to structure the code for JS as well.

Based on this, I'd suggest that the spectrum piece be removed from the API.
Thanks, Vlad.  I'm sure there are a lot of other things I'm doing that can be optimized or improved, too.  Our experience building demos with the two methods now is that as you pile on more and more things (e.g., the fft stuff is just the start, then you do dsp or beat detection, visuals in 2d or 3d, etc.) the lag you get with the js method is noticeable.  However, we haven't been using your webgl arrays, nor is my impl as tight as it could be, so it's worth us testing both again when I've done that.

----------

Toni, I'm not sure what to say to this, and would value input from other
people.  I don't really have much control over what gets created...it's just a
number of channels for me too, see:

http://mxr.mozilla.org/mozilla-central/source/media/libsydneyaudio/include/sydney_audio.h#320

I can come-up with all sorts of strings or codes, but in the end I just pass a
number, same as the API now.
The API allows you to "Set the mapping between channels and the loudspeakers". Thus it is in theory possible to support atleast "stereo" and "reverse stereo". This how ever is not too important. In fact we could drop the parameter and only support stereo. As turning mono into stereo is not too hard, and more complicated models are not too important at this stage. It might be good to just pass the string "stereo" for now, so one could add support for "dolby 5.1" or some other funky standard in the future. As that would be future extension, it does not matter whether the API you are using now supports it or not.
@Toni Ruottu: If I have understood you correctly, I would have to say dropping surround support in an audio API would be somewhat of a disappointment.

RE: 5.1: this is a great use-case for using this audio-data technology on video streams.

Most DVD players/TV Systems with surround-sound connections will "process" a stereo signal to create rear channels with altered stereo-space, creating a fake, but pleasing surround sound.

One of the techniques you can employ here is to mix an inverted mid-side signal with the original wave and adjust the stereo separation, somewhat in the vein of Boz Katz K-Stereo Stereoization techniques: http://www.digido.com/k-stereo.html

I think it would be great to be able to add surround to videos that do not already have a surround channel, by building a small JavaScript surround processor. Perhaps I should demo this with the current API so people can see a working example of this.
You are right. We should leave room for innovation. My original suggestion supports that. The change I would like to see in the current prototypes and drafts is stating audioOutput.mozSetup('stereo', 44100, 1); when one wants regular stereo sound. This removes the direct mapping between number of channels and interpretation, making it possible to have multiple different interpretations for a given amount of channels.

Please tell me, if you think this single idea is flawed. In case it is not flawed, I'm hoping to see people working around drafts and prototypes to reflect it. Someone might think that the idea is simple, and it might be, but even with simple ideas it is important to see how they work in practice. And in the end I'd rather have people commenting on the prototypes and drafts, than on a bug thread.
Can we talk to Monty (Chris Montgomery, xiphmont on irc) about the channel mapping issues?  He just did a pile of work on surround support in Vorbis and has a pretty detailed understanding of how the various operating systems handle mappings and the issues therein.  iirc, it's not as obvious as you might expect.
@Toni Ruottu: I went half way to creating a surround signal from a stereo signal. You can check the results here: http://code.bocoup.com/audio-data-api/examples/ambient-extraction-mixer/

The next phase (hehe) is to add some phase shifting to the ambient signal. I have surround sound gear here so if David Humphrey &&|| Chris Montgomery can expose some surround channel goodness, I can build some compelling test cases (provided the user has their computer hooked up to SS).

@Toni Ruottu: I see what you are saying now re: "stereo" as opposed to '2'. I definitely agree with you there. The API should then expose the channels based on the mapping description... "5.1", "4.1" and so on. The nest part of the puzzle would be to determine how these channels are accessed in the API. I bet taking a quick peek at Creative/ DirectX SDKs would give us a good idea for a standard.
(In reply to comment #49)
> I just went ahead and did that --
> http://people.mozilla.com/~vladimir/misc/jsfft.js
> http://people.mozilla.com/~vladimir/misc/nfft.cpp
> 
> (Note that the cpp impl is faster than what's in the patch -- the patch should
> not be using nsTArray's to store constant-sized arrays, nor AppendElement to
> build up constant-sized arrays; that would be a large speedup if we end up
> keeping the native impl.)
> 
> JS                  :  1000 FFTs: 593 ms (0.593ms per FFT)
> MSVC, no opt        :  1000 ffts: 639.000000 ms
> MSVC, -O1           :  1000 ffts: 530.000000 ms
> MSVC, -O2           :  1000 ffts: 234.000000 ms
> MSVC, -arch:SSE2 -O1:  1000 ffts: 483.000000 ms
> MSVC, -arch:SSE2 -O2:  1000 ffts: 496.000000 ms
> 
> Our standard compile flag is -O1 for space, so a JS impl is 2x slower (though
> still half a millisecond on my laptop).  I'm sure we can speed it up
> significantly, and there may well be a better way to structure the code for JS
> as well.
> 
> Based on this, I'd suggest that the spectrum piece be removed from the API.


Thanks Vlad for putting these benchmarks together. I have optimized the js example and would love if you could benchmark it on your machine. 

http://weare.buildingsky.net/processing/dsp.js/examples/jsfft-corban32.js
http://weare.buildingsky.net/processing/dsp.js/examples/jsfft-corban64.js

I am guessing this will bench on your machine around 390ms420ms per 1000 FFTs

Do you see any other areas we could optimize?
I really would like to emphasize on the gist of #29: The writing API should definitely be event based - Not only because setTimeout's resolution might not be sufficient (this is something one could work around when given the possibility of checking how full the audio buffer currently is), but also because it creates a more natural way of designing audio apps. Point in case: The following established sound APIs all use a callback based API or at least something vaguely resembling it: Flash 10 has already been mentioned, PortAudio, a portable audio output library written in C, CoreAudio (more specifically the AudioQueueServices) on OS X, JACK, ALSA (It's Interrupt based) on Linux, ASIO (by Steinberg) and DirectSound for Windows, VST, LADSPA.

What this also means (at least in my understanding) is that down below the Audio object is probably based on a callback based model anyway, because almost any operating system sound API works that way. While it might not be possible to expose this directly to the js apis, I don't think it should be too hard to implement that. 

That being said, I am pretty sure this is harder to do (and to get right) than the current raw write implementation and unfortunately I don't feel like I can help a lot with it (although I would like to), so I would at least like to hear from the people who have implemented the current proof of concept API how hard a callback based API would be to implement and if there are any hard technical impediments. 

In case you are wondering why I care, here's my take on software synthesis in the browser:

http://webloop.pixelpoke.de/
http://github.com/halfbyte/js-synth-experiments
http://jan.krutisch.de/en/2010/01/26/an-update-on-the-web-synth.html
(In reply to comment #57)
> What this also means (at least in my understanding) is that down below the
> Audio object is probably based on a callback based model anyway, because almost
> any operating system sound API works that way. 

Unfortunately it's not. The underlying audio library we use is not callback based. I wish it was and there are plans to make it so.
Jack uses that approach "because almost any operating system sound API works that way", but also because they want to provide support tight synchronization between multiple sound applications.

The model is a bit boring to users. When an audio application fails to provide the demanded sound in time, the application is kicked out. Thus where a non-callback app would just sound silly for half a second a callback based application needs to re-register with the sound system.

There may be other ways of implementing a callback based approach, but I think Jack-project has encountered all the related problems. They still think it is the right approach as it enables applications to do some things that are not normally possible.

Designing the sound API never discussed synchronizing multiple sites. I.e. what should I do, if I have a drum machine on one site, a synthesizer on another, and I want to record the result on an application running on a third site.

If we need to do such co-operating sites, callback based approach may well be the right design decision. I think there are still some inherent problems, that revealing callback sound API to an application developer will result in. It is a lot easier to throw a piece of wave to a black box player robot every once in a while, and let thing fail if our sound generation is too slow.
Attached patch [WIP] experimental work v. 13g (obsolete) — Splinter Review
Almost done and hope to have this finished and ready for review by end of next week.  I've documented 95% of the changes to the API at https://wiki.mozilla.org/Audio_Data_API, including many examples of how to use it.  I'll finish the rest before I submit for review.

I blogged about a bunch of the new things we've built using this API here http://vocamus.net/dave/?p=1092.
Assignee: nobody → david.humphrey
Attachment #442712 - Attachment is obsolete: true
Status: NEW → ASSIGNED
OS: Linux → All
Hardware: x86 → All
How about "stereo" for stereo? You can continue to support number 2 beside it, if you wish to have backwards compatibility with all the demos out there.
I'd still like to hear from other people on this, but my personal opinion is that we should just stick with the SMPTE/ITU-R recommendations and use integers.  If you look at Vorbis, you find:

http://xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9

Same for flac, see http://flac.sourceforge.net/format.html and search for "Channel assignment".  Same for the Mozilla implementation.

So my vote is "when in Rome," use an integer.  If others disagree, it can be changed.  Toni, also note that nothing I've written so far has been reviewed or discussed outside our group, so this sort of thing can still be addressed in the coming weeks.
Attached patch Patch (v. 13h) (obsolete) — Splinter Review
Chris, Chris, Matthew -- not sure which of you to give this to for review, but starting with Matthew based on comment 42.  I know you're all slammed with WebM, so if there is someone else with more time, let me know and I'll change it to them.

This patch implements the API described in detail at https://wiki.mozilla.org/Audio_Data_API.  I won't repeat what I've written there, except to say that we've tried hard to keep this from becoming a huge API that dramatically changes the current HTML5 audio/video stuff.  Instead, we've worked to integrate with it, which has influenced some of our design decisions.  The API is well tested, and many demos have been created using it over the past six months.  I've blogged about them here: http://vocamus.net/dave/?cat=25.

Here are some notable examples of demos we've built, which will be useful for reviewing and testing:

* Visualizing audio, beat detection, WebGL - http://cubicvr.org/CubicVR.js/bd3/BeatDetektor3HD-13g.html
* Text to Speech in JS - http://scotland.proximity.on.ca/dxr/tmp/audio/tts/
* Port of Pure Data to JS - http://mccormick.cx/dev/webpd/demos/xmen-sample-looper/
* Audio Sampler in JS - http://weare.buildingsky.net/processing/dsp.js/examples/sampler.html

Many more demos are available on the API wiki page.  We have had a lot of developers work with this, and their feedback has been taken into account as we've evolved the work since December.  The reaction of the web, audio, and digital music communities to this work has been overwhelming.  We've heard from many people who don't normally get excited about the web, and they are excited about this!

This patch does not include tests, and that's one of many things I'd like feedback on so I can do this properly.  Also, I have left a few XXXdh comments in the code where I had questions I couldn't answer.

Based on comment 48 and comment 49, the native FFT has been removed, and we now do everything in JS using WebGL typed arrays to give us the speed-up we need.  We've proven Vlad right about being able to dump the C++ implementation (e.g., the first demo above is all done with JS).

Our work has proven that JS is fast enough to process, visualize, and generate audio in the browser (you'd be amazed at how many people *still* tell us it isn't possible, but we have the demos to prove it).  We've also shown that the web is ready for audio data to become a part of HTML (the W3C Audio Incubator group gives us hope of this going beyond Mozilla, too).

I have a lot of time in June to devote to this (less in July/August), and if there is anything I can do to make the review process easier, or things I can fix/improve, I'll be happy to do that (we hang out in #audio on moznet).  We're excited about refining this so that Mozilla users can start having as much fun as we're having using it.

Although I'm bringing this patch to you now, many people have worked with me on the project.  Among them I want to centre out Yury Delendik, who has worked actively with me on this patch in the latter stages of its development.  I'd also like to thank Chris Pearce for his contributions, and to all of you who wrote the content/media code, which has proven fertile ground for our explorations thus far.

Thanks for your consideration of this API and implementation.
Attachment #448127 - Attachment is obsolete: true
Attachment #448900 - Flags: review?(kinetik)
Thanks David! A few comments on the API:

I think instead of having nsIDOMNotifyAudioMetadataEvent we should just add mozChannels, mozSampleRate and mozFrameBufferLength to the HTMLMediaElement, and specify default values that are returned before "metadataloaded" has fired (and if there is no audio) --- '0' is probably fine. That is more consistent with 'duration', and it's simpler.

Why not make mozCurrentSampleOffset and mozTime doubles, in seconds, to match 'currentTime' and 'duration' on media elements?

> The best time to call mozSetFrameBufferSize() is in the loadedmetadata event,
> when the audio info is known, but before the audio has started or events begun
> firing.

If the element is "autoplay", then the audio might start even before "loadedmetadata" handlers have run, so you might want to mention that you don't want "autoplay" in this case.
Why are the extensions only on nsIDOMHTMLAudioElement? I think you should at least be able to read the audio from a <video> element, so mozSetFramebufferSize and mozCurrentSampleOffset belong on HTMLMediaElement. I guess it makes sense to only allow writing audio through an <audio> element.

You might also want to clarify in your documentation/spec that the AudioWritten event fires when data has been written to the audio stream, and the audio may or may not have been played yet. You should also clarify that the data in mozFrameBuffer has not been adjusted for any mute/volume set on the source media element.
The event-based API as it stands seems to preclude any kind of processing that requires reading of audio data faster than realtime. 

I.e., I would have to wait 30 seconds if I want to do something with the last portion of a 30-seconds audio piece. This is because audio is decoded on playback, which happens realtime, and no faster than that. Is this interpretation correct?

If so, it would prohibit an entire class of applications that are not based on the processing-as-you-play paradigm. Just one example: a web-based audio editor. You'd want to load audio and display a spectrogram and oscillogram over an arbitrary time interval in a fraction of a second. This can be done today faster than realtime with base-64-string-encoded audio, admittedly a hack. How can this be done more elegantly with the present API?
Sorry for the delay.  I'll review this in the next couple of days, promise!
No problem, that sounds great.  I'll have a new version of this patch up in the next 12 hours to address roc's issues above.
(In reply to comment #66)
+1
An other use case : audio enrichment for web publishing. In Scenari (Xulrunner app and Firefox extension), we used a C Xpcom implementation but It was too difficult to maintain for all platforms. Now we work with plugin flash :( ! 
Markus Walther proposition sounds very good.

Is it technically out of the scope ?
Attached patch Patch (v. 13i) (obsolete) — Splinter Review
This patch addresses roc's suggestions (thanks for the feedback):

* Removed nsIDOMNotifyAudioMetadataEvent
* Added mozChannels, mozSampleRate, mozFrameBufferLength to nsHTMLMediaElement
* Removed mozSetFrameBufferLength from nsHTMLAudioElement
* Converted mozTime (in AudioWritten event) to float value (seconds instead of ms)

I've also updated the documentation for these changes, plus the other specific doc changes he suggested.  Rather than breaking all our existing demos and builds, I've started a new wiki page that I'll keep up to date during review:

https://wiki.mozilla.org/Audio_Data_API_Review_Version

A couple of things I disagreed on, and haven't changed.  First, I haven't changed mozCurrentSampleOffset to doubles, since we really need to know the actual sample number (this is a whole number indicating the number of samples since 0).  All of our events and methods are sample based, and so knowing position information with a resolution of samples vs. seconds is important for knowing how much data to write.  Second, I've left mozCurrentSampleOffset on the HTMLAudioElement instead of moving to HTMLMediaElement because it relates to the audio stream used for writing only, not to the one used if there is a src attribute.  Perhaps the answer is to add it for read-only audio too, but it's not really useful unless you're trying to keep track of where you are for writing.

One thing I'm not sure about here is whether it's good practice to have an attribute throw if values are not within an expected range.  It might be that setting mozFrameBufferLength should happen via a method instead, I wasn't sure.  For now I did it as a readable/writable attribute.

Finally, in response to comment 66 and reading audio data faster than real-time...I can see how this would be valuable for sure, but I'm not clear how you'd do it without more invasive changes to the media code than I've done here.  I'll let Matthew/Chris/Chris speak to this.  Perhaps it can be done in this bug; perhaps it's another bug; perhaps it needs to get taken to the W3C AudioXG for future consideration.
Attachment #448900 - Attachment is obsolete: true
Attachment #450138 - Flags: review?(kinetik)
Attachment #448900 - Flags: review?(kinetik)
Does your code have any special-case handling for cross-origin media? We support loading media from servers other than the server hosting the HTML that contains the <audio> or <video> tag, just like with <img>. It's my understanding that if you draw a cross-origin <img> to a <canvas> context, it sets a flag on that canvas such that you are no longer allowed to get data out (by peeking at pixels or using toDataURI). Do we need a similar cross-origin check here, so that we don't allow audiowritten events to fire on cross-origin media? A non-theoretical attack I can think of is that if you had a streaming <video> or <audio> of a confidential meeting, and a malicious page can guess the URL of the stream, it could load that same stream and use the audiowritten events to capture the audio data.
Looks good, with one comments... is there any reason that arrays of floats are the only supported data format?  It seems like it would be nice to support 16-bit and 32-bit integers in addition to normalized floats, given that it would let you both avoid floating point math on the JS side (which will be a good bit faster, especially so on mobile devices), and let you avoid doing conversions on the native side since a lot of your source data is going to be 16bit etc. anyway.

Well, one more comment, but it's a little bikeshed-y; "audioWritten" sounds really weird as an event name (because noone did the explicit writing); "audioAvailable" seems better.
(In reply to comment #70)
> A couple of things I disagreed on, and haven't changed.  First, I haven't
> changed mozCurrentSampleOffset to doubles, since we really need to know the
> actual sample number (this is a whole number indicating the number of samples
> since 0).  All of our events and methods are sample based, and so knowing
> position information with a resolution of samples vs. seconds is important for
> knowing how much data to write.  Second, I've left mozCurrentSampleOffset on
> the HTMLAudioElement instead of moving to HTMLMediaElement because it relates
> to the audio stream used for writing only, not to the one used if there is a
> src attribute.  Perhaps the answer is to add it for read-only audio too, but
> it's not really useful unless you're trying to keep track of where you are for
> writing.

That sounds fine.
Ted's right, we need to do a same-origin check here and not deliver data for cross-origin loads.
(In reply to comment #72)
> Looks good, with one comments... is there any reason that arrays of floats are
> the only supported data format?  It seems like it would be nice to support
> 16-bit and 32-bit integers in addition to normalized floats, given that it
> would let you both avoid floating point math on the JS side (which will be a
> good bit faster, especially so on mobile devices), and let you avoid doing
> conversions on the native side since a lot of your source data is going to be
> 16bit etc. anyway.

Our Vorbis decoder outputs floats. Conceivably we could use Tremor and get integers, but that would be a big change and it's not clear it would be worth it. Only WAV files would give us integer samples today.

We definitely don't want JS authors to have to have two code paths, one for floats and one for integers, so if we were to deliver integers to JS, we'd have have JS specify the desired format somehow.

I think for now we should probably just leave things as-is. If it's desirable in the future to support returning integer samples, we could add new attributes to the AudioWritten event, e.g. mozFrameBufferInt32 and mozFrameBufferInt16. Then we could store the samples internally in the AudioWritten event in whatever format came out of the decoder, and convert them to the requested format on demand.
> Well, one more comment, but it's a little bikeshed-y; "audioWritten" sounds
> really weird as an event name (because noone did the explicit writing);
> "audioAvailable" seems better.

No, I knew the name would need to change, so bikeshed away :)  I'm fine with audioAvailable, if others are.  Question: should it be mozAudioAvailable and the event properties named without moz*, or audioAvailable with properties using moz* like I have now?

I'll look into adding a same-origin check, but not do a new patch until Matthew does his review in the next few days.
(In reply to comment #76)
> > Well, one more comment, but it's a little bikeshed-y; "audioWritten" sounds
> > really weird as an event name (because noone did the explicit writing);
> > "audioAvailable" seems better.
> 
> No, I knew the name would need to change, so bikeshed away :)  I'm fine with
> audioAvailable, if others are. 

One more comment here: the name should fit with a potential future video frame(s) equivalent; e.g. "videoAvailable".  videoFramesAvailable?  audioFramesAvailable? iCanHazAudio/iCanHazVideo?
Their spec is insanely complicated. We could address most of the problems they have with JS by extending our approach to enable audio processing in Workers. I think we should just go ahead with our approach for now.
Sorry, just saw this comment:

(In reply to comment #56)
> Thanks Vlad for putting these benchmarks together. I have optimized the js
> example and would love if you could benchmark it on your machine. 
> 
> http://weare.buildingsky.net/processing/dsp.js/examples/jsfft-corban32.js
> http://weare.buildingsky.net/processing/dsp.js/examples/jsfft-corban64.js

I upped the count to 10000, 1000 wasn't enough to get a clean sample.  Here's the numbers, it's a pretty significant speedup!

corban32: ~3100ms (0.310ms per fft)
corban64: ~2975ms (0.297ms per fft)
Woo Hoo.

Thanks Vlad, that was better than I thought.
Based on irc discussions with roc, I put together a simulation for testing FFT calculation on mobile (I don't have a device to test this, or make a real audio build, or I'd just do that).

http://scotland.proximity.on.ca/dxr/tmp/audio/fft/index.html

This uses real samples (i.e., I ran an audio data enabled build and extracted 10K samples so it would be valid data), and does what we do when we calculate FFT for visualizations with js.

On my Mac (10.6) with a nightly I get:

Total Time = 2129ms (0.2129 ms per FFT)

Blizzard tested this on an Evo 4G and got:

Total Time = 25364ms (2.5364 ms per FFT)

However, I don't think that Fennec has typed arrays yet (correct?).  Without them, these numbers are pretty much meaningless.  For example, here is what I get in 3.6, without typed arrays:

Total Time = 8348ms (8.348 ms per FFT)
Need to compare the native code as well; also, JS is still woefully unoptimized for ARM.  I'll do some benchmarks when I get a chance.
Comment on attachment 450138 [details] [diff] [review]
Patch (v. 13i)

Sorry for the delay getting this reviewed.  As discussed on IRC, I'm
reviewing this with the agreement that we're aiming to get something landed
to make testing and development easier, and that this is unlikely to be the
final API (and, thus, major changes may occur after this version lands).  Here's my first pass of the code review.  I'll make comments on the API separately.

The content changes need a content peer to review them, I'm not qualified.
I skipped over the QS stuff completely as I'm not familiar with it.

+++ b/content/html/content/public/nsHTMLAudioElement.h
+#include "nsAudioStream.h"

Not needed?

+++ b/content/html/content/src/nsHTMLAudioElement.cpp
+nsHTMLAudioElement::MozSetup(PRUint32 aChannels, PRUint32 aRate, float aVolume)

Does it make sense to use the volume attribute on the element for this?  It
seems strange to have multiple ways to set it.

+  mAudioStream = new nsAudioStream();
+  nsresult rv = mAudioStream->Init(aChannels, aRate, nsAudioStream::FORMAT_FLOAT32);
+  if (NS_FAILED(rv))
+    return rv;
+
+  if (!mAudioStream)
+    return NS_ERROR_FAILURE;

NULL check is performed too late.  You can remove it completely, though,
because new is infallible.

+  if (mAudioStream) {

Make this if (!mAudioStream) and early error return, then the rest of
function can be unindented.

+    if (aDataLength % mChannels != 0)
+      return NS_ERROR_DOM_INDEX_SIZE_ERR;
+    else {

No need for an else if the error path returns.

+  if (mAudioStream)
+    *_retval = mAudioStream->GetSampleOffset();
+  else
+    return NS_ERROR_DOM_INVALID_STATE_ERR;

Same deal.  Invert the test and unindent the success path.

+  if (mHasAudioStream)
+    *aMozChannels = mChannels;
+  else
+    *aMozChannels = mDecoder ? mChannels : 0;

If there's no audio stream and no decoder, should this return an error
rather than success with 0 channels?

Same question about GetMozSampleRate.

+PRInt64 nsAudioStream::GetSampleOffset()

I don't think this should return 0 on error.  Also, it seems like
GetSampleOffset and GetPosition can share most of their code.  GetPosition
can call GetSampleOffset and then convert the result to milliseconds.

+void nsAudioWrittenEventManager::DispatchPendingEvents(PRUint64 aCurrentTime)
+  if (mPendingEvents.Length() > 0) {
+    do {
+    } while (mPendingEvents.Length() > 0);

Simplify this by removing the if and moving the while test to the stop of
the loop.

+    mSignalBuffer = new float[currentBufferSize];
+    if (nsnull == mSignalBuffer)
+      return;

New is infallible.

+private:
+  // Manager for queuing and dispatching audiowritten events.
+  nsAudioWrittenEventManager mEventManager;

This should be declared before the PRPackedBools.
Oh, one thought on the audio channel mapping issue: do what Vorbis does.  Specifically, use http://xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9
 as a mapping between channel numbers and positioning.  Our sound code doesn't support it yet, but I intend to fix that in the near future.
(In reply to comment #70)
> Created an attachment (id=450138) [details]
> Patch (v. 13i)


Looks good! Drive by review:

content/media/nsAudioWrittenEventManager.cpp is copyright 2007, should be 2010?

+void nsAudioWrittenEventManager::DispatchPendingEvents(PRUint64 aCurrentTime)

Would merging all the events with (e->mTime <= aCurrentTime) into a single event reduce the number of events sent, and so reduce the message dispatch overhead? Or is it undesirable to have audiowritten events with variable length frames?

+void nsAudioWrittenEventManager::Drain(PRUint64 aEndTime)

What about sound which has been pushed to the audio hardware but not has not been played yet? Typically there's 2 seconds of audio that's been pushed to the hardware, but hasn't been played. Won't calling Drain() cause audiowritten events to arrive before their corresponding sound has played? This would cause a clump of audiowritten events to come in when decode reaches the end of the media, and nsAudioStream::Drain() is called.

 interface nsIDOMHTMLAudioElement : nsIDOMHTMLMediaElement
 {
+  // This is a dummy function; for JS it is implemented as a quickstub
+  // that calls the _explicit method.
+  void mozWriteAudio();
+
+  [noscript] unsigned long mozWriteAudio_explicit([array, size_is(dataLength)] in float dataPtr,
+                                                  in unsigned long dataLength);
+

Can you just declare this in the nsHTMLAudioElement.h file, rather than adding it to the IDL? Can you do something similar for the other [noscript] methods you declare in the IDLs? That would reduce the clutter in the IDLs.
Comment on attachment 450138 [details] [diff] [review]
Patch (v. 13i)

Adding jst for content/quickstubs feedback, as per discussions yesterday.  I'm going to fix the audio specific bits based on the above feedback soon.
Attachment #450138 - Flags: review?(jst)
Comment on attachment 450138 [details] [diff] [review]
Patch (v. 13i)

I'll review the content part.
Seems like the patch doesn't apply cleanly to trunk. There has been some changes for example in nsDOMEvent::SetEventType.
Attachment #450138 - Flags: review?(jst) → review?(Olli.Pettay)
I have an updated version of this that will apply, which I'll put up soon.  Thanks.
Comment on attachment 450138 [details] [diff] [review]
Patch (v. 13i)

Minor drive-by comment:

>+NS_IMETHODIMP
>+nsHTMLAudioElement::MozWriteAudio_explicit(float *aDataPtr, PRUint32 aDataLength,
>+                                           PRUint32 *_retval NS_OUTPARAM)
>+{
>+  *_retval = 0;
>+  if (mAudioStream) {
>+    // Make sure that we are going to write the correct amount of data based
>+    // on number of channels.
>+    if (aDataLength % mChannels != 0)
>+      return NS_ERROR_DOM_INDEX_SIZE_ERR;
>+    else {
>+      // Don't write more than can be written without blocking.
>+      PRUint32 available = mAudioStream->Available();
>+      PRUint32 writeLen = aDataLength;
>+      if (writeLen > available)
>+        writeLen = available;
>+
>+      nsresult rv = mAudioStream->Write(aDataPtr, writeLen, PR_TRUE);
>+      if (NS_FAILED(rv))
>+        return rv;
>+
>+      // Return the actual amount written.
>+      *_retval = writeLen;
>+      return rv;
>+    }
>+  } else
>+    return NS_ERROR_DOM_INVALID_STATE_ERR;
>+}

Awkward bracing etc. in the above function -- should have { } braces everywhere (especially if any of the if clauses has them), but it can also be made much simpler if it was written like this, with all the state checking up top, instead of deeply nested:

{
  if (!mAudioStream) {
    return NS_ERROR_DOM_INVALID_STATE_ERR;
  }

  if (aDataLength % mChannels != 0) {
    return NS_ERROR_DOM_INDEX_SIZE_ERR;
  }

  ...
  return rv;
}

No need to set retval at the top either; shouldn't touch out args if an error is returned.
Attached patch Patch (v. 14a) (obsolete) — Splinter Review
This is updated to trunk (0628b5695f2b) and addresses issues identified by Matt, Chris, and Vlad (thanks, guys).  In terms of the API itself, changes include:

* renamed AudioWritten to AudioAvailable
* removed volume from mozSetup(), and now tie write stream into .volume/.muted code
* As per roc's suggestion earlier (comment 64) that mozChannels and mozSampleRate be zero by default, I have left these to return 0 vs. -1.  If that's wrong, let me know.

This still lacks tests, same-origin checks, and caching of typed array on audioavailable event.  I'm going to let smaug comment on this before I do any more work with it.  I'll also update the docs to reflect the changes in this patch later tonight.
Attachment #450138 - Attachment is obsolete: true
Attachment #457307 - Flags: review?(kinetik)
Attachment #457307 - Flags: review?(Olli.Pettay)
Attachment #450138 - Flags: review?(kinetik)
Attachment #450138 - Flags: review?(Olli.Pettay)
First of all, great work guys! I'm not able to implement things, due to extreme lack of time, but have been peeking into this thread frequently.

However, I see that using events in audio data generation has been tackled yet. From the comments I've seen before, I still haven't totally understood which are the main blocks to implement the API in such direction.

I really really see using setInterval() for this as a big hack. I reiterate that Flash's approach on this, i.e., using event generation and subscription should be the best (most accurate) way to do this.

Cheers!
Comment on attachment 457307 [details] [diff] [review]
Patch (v. 14a)

>diff --git a/content/base/src/nsContentUtils.cpp b/content/base/src/nsContentUtils.cpp
>--- a/content/base/src/nsContentUtils.cpp
>+++ b/content/base/src/nsContentUtils.cpp
>@@ -595,16 +595,17 @@ nsContentUtils::InitializeEventTable() {
>     { nsGkAtoms::oncanplaythrough,              NS_CANPLAYTHROUGH, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::onseeking,                     NS_SEEKING, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::onseeked,                      NS_SEEKED, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::ontimeupdate,                  NS_TIMEUPDATE, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::onended,                       NS_ENDED, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::onratechange,                  NS_RATECHANGE, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::ondurationchange,              NS_DURATIONCHANGE, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::onvolumechange,                NS_VOLUMECHANGE, EventNameType_HTML, NS_EVENT_NULL },
>+    { nsGkAtoms::onaudioavailable,              NS_AUDIOAVAILABLE, EventNameType_HTML, NS_AUDIOAVAILABLE },
The event name should be MozAudioAvailable, and the atom onMozAudioAvailable.

And do we really need support for onMozAudioAvailable attribute/property or would
audio.addEventListener("MozAudioAvailable", listener, true); be enough?
If .addEventListener is enough, then there shouldn't be any need for NS_AUDIOAVAILABLE, and
also the change in nsDOMEvent.h/cpp wouldn't be needed.


> #ifdef MOZ_MEDIA
>   "loadstart", "progress", "suspend", "emptied", "stalled", "play", "pause",
>   "loadedmetadata", "loadeddata", "waiting", "playing", "canplay",
>   "canplaythrough", "seeking", "seeked", "timeupdate", "ended", "ratechange",
>-  "durationchange", "volumechange",
>+  "durationchange", "volumechange", "audioavailable",
MozAudioAvailable

>+NS_IMETHODIMP
>+nsHTMLAudioElement::MozWriteAudio_explicit(float *aDataPtr, PRUint32 aDataLength,
>+                                           PRUint32 *_retval NS_OUTPARAM)
aRetVal and remove NS_OUTPARAM
same also elsewhere



>+NS_IMETHODIMP nsHTMLMediaElement::GetMozSampleRate(PRUint32 *aMozSampleRate)
Newline after NS_IMETHODIMP
>+nsresult nsHTMLMediaElement::DispatchAudioAvailableEvent(float* aFrameBuffer,
>+                                                         PRUint32 aFrameBufferLength,
>+                                                         PRUint64 aTime)
>+{
>+  nsCOMPtr<nsIDOMDocumentEvent> docEvent(do_QueryInterface(GetOwnerDoc()));
Add nullcheck here.
>+[scriptable, uuid(6250652d-7a6a-49a4-a2ee-9114e1e83427)]
>+interface nsIDOMNotifyAudioAvailableEvent : nsIDOMEvent
This is mozilla-only interface, so no need for moz prefixes in the properties, IMO.
Especially if the event name will start with Moz


>+[scriptable, uuid(cd1a6a6b-bc4c-4e5a-b7da-53dccc878ab8)]
> interface nsIDOMHTMLAudioElement : nsIDOMHTMLMediaElement
> {
>+  // This is a dummy function; for JS it is implemented as a quickstub
>+  // that calls the _explicit method.
>+  void mozWriteAudio();
>+
>+  [noscript] unsigned long mozWriteAudio_explicit([array, size_is(dataLength)] in float dataPtr,
>+                                                  in unsigned long dataLength);
I wonder if this is really needed.
The quickstub code could probably get access to nsDOMHTMLAudioElement and call some
non-virtual method.
Attachment #457307 - Flags: review?(Olli.Pettay) → review-
This addresses the issues in comment 93, with 2 exceptions (see below), adds mochitests for the event and write operations, adds a same-origin check before dispatching MozAudioAvailable events, and adds code to cache the Float32Array exposed via MozAudioAvailable.frameBuffer.  When this goes in, there is a litmus test I'll add too, to test that generated audio data is the same as audio which gets played via the decoder.

I have left the onMozAudioAvailable attribute/property, since I think it's nice to have.  I wasn't sure how to address your last comment, and if you have more guidance, I'd appreciate it.

Further review comments notwithstanding, this patch is now complete, and I'd like to try and get it into Firefox 4 if at all possible.  Let me know if that's going to be an issue for either of your review schedules.
Attachment #457307 - Attachment is obsolete: true
Attachment #458730 - Flags: review?(kinetik)
Attachment #458730 - Flags: review?(Olli.Pettay)
Attachment #457307 - Flags: review?(kinetik)
Thinking more about the comments by Chris and Olli to clean-up the IDL, this version removes the quickstubs code, replacing it with implicit JSContext and jsval in IDL.  Because this stuff is so new, I'm not clear on best practices for using it yet--this seems cleaner to me, but some people have told me that I'd still need/want the quickstubs code (others have told me that's wrong).  Someone who knows more about this can guide me here, and choose between these two versions.
Attachment #458730 - Attachment description: Patch (v. 15b) → Patch (v. 15b) - quickstubs version
So which one is faster?
If you do event.frameBuffer in a loop, is the jsval version as fast as the
quickstub version?
I guess you should quickstub nsIDOMNotifyAudioAvailableEvent interface.
quickstub version for 100000 iterations getting event.frameBuffer is 10ms, the other one is 168ms, but I'm now trying what you suggest in comment 97 (Jason said the same thing just now to me on irc).  I'll post those results soon.
This is the "no quickstubs, jsval in IDL" version above that was taking 168ms, but now with quickstubs for frameBuffer and mozWriteAudio, which makes it as fast as the custom quickstubs version (10ms for 100K iterations of getting .frameBuffer).  Given that this is as fast, and the code less complicated, I think this is the one to use.
Attachment #458730 - Attachment is obsolete: true
Attachment #459004 - Attachment is obsolete: true
Attachment #459030 - Flags: review?(kinetik)
Attachment #459030 - Flags: review?(Olli.Pettay)
Attachment #458730 - Flags: review?(kinetik)
Attachment #458730 - Flags: review?(Olli.Pettay)
Right, using jsval allows our quickstub generator to generate the quickstubs automatically, so that's definitely the way to go.
Comment on attachment 459030 [details] [diff] [review]
Patch (v. 15d) - jsval in IDL, replaced CustomQS with regular QS

>+GK_ATOM(audioavailable, "mozaudioavailable")
Do you actually use this somewhere?


>+class nsDOMNotifyAudioAvailableEvent : public nsDOMEvent,
>+                                       public nsIDOMNotifyAudioAvailableEvent
>+{
>+public:
>+  nsDOMNotifyAudioAvailableEvent(nsPresContext* aPresContext, nsEvent* aEvent,
>+                                 PRUint32 aEventType, float * aFrameBuffer,
>+                                 PRUint32 aFrameBufferLength, float aTime);
>+
>+  NS_DECL_ISUPPORTS_INHERITED
>+  NS_DECL_NSIDOMNOTIFYAUDIOAVAILABLEEVENT
>+
>+  // Forward to base class
>+  NS_FORWARD_TO_NSDOMEVENT
>+
>+  nsresult NS_NewDOMAudioAvailableEvent(nsIDOMEvent** aInstancePtrResult,
>+                                        nsPresContext* aPresContext,
>+                                        nsEvent *aEvent,
>+                                        PRUint32 aEventType,
>+                                        float * aFrameBuffer,
>+                                        PRUint32 aFrameBufferLength,
>+                                        float aTime);
>+
>+  ~nsDOMNotifyAudioAvailableEvent();
>+
>+private:
>+  nsAutoArrayPtr<float> mFrameBuffer;
>+  PRUint32 mFrameBufferLength;
>+  float mTime;
>+
>+  JSRuntime* mJSRuntime;
Do you really need this?
Perhaps make it available in nsContentUtils.
nsAutoGCRoot has already it as a static member.


>+void nsHTMLMediaElement::NotifyAudioAvailable(float* aFrameBuffer,
>+                                              PRUint32 aFrameBufferLength,
>+                                              PRUint64 aTime)
>+{
>+  // Do same-origin check on element and media before allowing MozAudioAvailable events.
>+  if (!mMediaSecurityVerified) {
>+    nsresult rv = NodePrincipal()->Subsumes(GetCurrentPrincipal().get(), &mAllowAudioData);
You leak current principal here.



>+nsresult nsHTMLMediaElement::DispatchAudioAvailableEvent(float* aFrameBuffer,
>+                                                         PRUint32 aFrameBufferLength,
>+                                                         PRUint64 aTime)
>+{
>+  nsCOMPtr<nsIDOMDocumentEvent> docEvent(do_QueryInterface(GetOwnerDoc()));
>+  nsCOMPtr<nsIDOMEventTarget> target(do_QueryInterface(static_cast<nsIContent*>(this)));
>+  NS_ENSURE_TRUE(docEvent && target, NS_ERROR_INVALID_ARG);
>+
>+  nsCOMPtr<nsIDOMEvent> event;
>+  nsresult rv = docEvent->CreateEvent(NS_LITERAL_STRING("MozAudioAvailableEvent"),
>+                                      getter_AddRefs(event));
>+  nsCOMPtr<nsIDOMNotifyAudioAvailableEvent> audioavailableEvent(do_QueryInterface(event));
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  rv = audioavailableEvent->InitAudioAvailableEvent(NS_LITERAL_STRING("mozaudioavailable"),
>+                                                    PR_TRUE, PR_TRUE, aFrameBuffer, aFrameBufferLength,
>+                                                    (float)aTime/1000 /* convert ms to seconds */);
So how often do we dispatch these events? If there are now MozAudioAvailable listeners, does the
event dispatch show up badly in the profiles or should be optimize out event dispatch like
we do for example for MozAfterPaint?

>+
>+#if !defined(nsAudioAvailableEventManager_h__)
#ifndef

>+    # Audio
>+    'nsIDOMNotifyAudioAvailableEvent.frameBuffer',
>+    'nsIDOMHTMLAudioElement.mozWriteAudio',
Why only these? Why not .* ? or at least
all the properties and methods you've added?

I should still re-read this but getting a bit late here.
r- because of memory leaks.
Please try to run with a debug build and env variable XPCOM_MEM_LEAK_LOG.
Attachment #459030 - Flags: review?(Olli.Pettay) → review-
Attached patch Patch (v. 15e) (obsolete) — Splinter Review
>>+GK_ATOM(audioavailable, "mozaudioavailable")
>Do you actually use this somewhere?

I followed volumechange/onvolumechange.  You tell me if I should remove one, it's not clear if following existing code in the tree (only one of these has internal users).

>>+class nsDOMNotifyAudioAvailableEvent : public nsDOMEvent,
>>+                                       public nsIDOMNotifyAudioAvailableEvent
>>+  JSRuntime* mJSRuntime;
>Do you really need this?
>Perhaps make it available in nsContentUtils.
>nsAutoGCRoot has already it as a static member.

I don't totally follow this, and need some more time to read that code, and maybe chat with you on irc.

>You leak current principal here.

Fixed the leak.

>So how often do we dispatch these events? If there are now MozAudioAvailable
>listeners, does the event dispatch show up badly in the profiles or should be
> optimize out event dispatch like we do for example for MozAfterPaint?

The number of events per second will vary with the audio rate, number of channels, and framebuffer length chosen (512 - 32K, with a default of 1024 per channel), but typically (44100 with 2 channels) this will mean 43 MozAudioAvailable events per second.  Is there are preferred way for me to profile?  I usually use shark, but could do something else if you let me know what you'd like to see.

>>+#if !defined(nsAudioAvailableEventManager_h__)
>#ifndef

Fixed.

>>+    # Audio
>>+    'nsIDOMNotifyAudioAvailableEvent.frameBuffer',
>>+    'nsIDOMHTMLAudioElement.mozWriteAudio',
>Why only these? Why not .* ? or at least
>all the properties and methods you've added?

This patch adds .time, but mozCurrentSampleOffset() is not working -- the generated quickstub looks like this:

    !; // TODO - Convert `result` to jsval, store in rval.
    return xpc_qsThrow(cx, NS_ERROR_UNEXPECTED); // FIXME

I am not sure, but wonder if unsigned long long is not supported?
Attachment #459030 - Attachment is obsolete: true
Attachment #459302 - Flags: review?(kinetik)
Attachment #459302 - Flags: review?(Olli.Pettay)
Attachment #459030 - Flags: review?(kinetik)
..also, thanks a lot for your quick review, really appreciated.
Indeed long long isn't quickstubbed yet.

Profiling using shark is the way I do it.
I should have said:
"If there are *no* MozAudioAvailable listeners, does the
event dispatch show up badly in the profiles"
Here's a report for a webm playing, no MozAudioAvailable listeners.  See specifically the bits introduced by this patch, namely:

- 0.0%, nsAudioAvailableEventRunner::Run(), XUL
- 0.0%, nsHTMLMediaElement::NotifyAudioAvailable(float*, unsigned int, unsigned long long), XUL
- 0.0%, nsHTMLMediaElement::DispatchAudioAvailableEvent(float*, unsigned int, unsigned long long), XUL
- 0.0%, nsBuiltinDecoder::AudioAvailable(float*, unsigned int, unsigned long long), XUL
- 0.0%, nsAudioAvailableEventManager::DispatchPendingEvents(unsigned long long), XUL

Looking at a profile for all threads is similar.  If there's a better test I can do, let me know and I'll do that, too.
What about playing some wav file? That wouldn't spend so much time in
decoding, I think.
The wave decoder doesn't produce MozAudioAvailable events currently -- we plugged into the nsBuiltin* base classes, and wave doesn't use them yet.  When it gets refactored, it will.  I can do an ogg audio, though.
Here's just an audio (.ogg) playing, and specifically:

+ 0.6%, __memcpy, libSystem.B.dylib
| + 0.1%, nsAudioAvailableEventManager::QueueWrittenAudioData(float*, unsigned int, unsigned long long), XUL
| | - 0.1%, nsBuiltinDecoderStateMachine::AudioLoop(), XUL
| - 0.1%, audio_callback, XUL
| - 0.0%, TimerThread::Run(), XUL
| - 0.0%, TimerThread::AddTimerInternal(nsTimerImpl*), XUL
| - 0.0%, nsAudioAvailableEventManager::DispatchPendingEvents(unsigned long long), XUL
| - 0.0%, CGEventGetEventRecord, CoreGraphics
| - 0.0%, _DPSNextEvent, AppKit
| - 0.0%, _cairo_pattern_init_static_copy, XUL
| - 0.0%, -[NSRectSet initWithCopyOfRects:count:bounds:], AppKit
| - 0.0%, -[NSEvent _cgsEventRecord], AppKit
| - 0.0%, -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:], AppKit

- 0.1%, nsBuiltinDecoder::AudioAvailable(float*, unsigned int, unsigned long long), XUL

- 0.0%, nsHTMLMediaElement::NotifyAudioAvailable(float*, unsigned int, unsigned long long), XUL

- 0.0%, nsAudioAvailableEventRunner::Run(), XUL
I've removed that atom, and have the code to optimize out MozAudioAvailable events if there are no listeners done.  Now I am just trying to rewrite the JSRuntime stuff you pointed out above.  I felt like I was getting somewhere, then spoke to peterv on irc.  I'll keep working at it, but if others can short circuit my work, here's what he said:

13:14 < humph> I'm caching a typed array after the first call in an event object
13:14 < humph> (jsval)
13:15 < humph> I was doing something like this: http://mxr.mozilla.org/mozilla-central/source/dom/indexedDB/IDBEvents.cpp#392
13:17 < peterv> hmm, I don't know if you can use nsAutoGCRoot there
13:17 < humph> I don't think I can, since it's only meant for the stack
13:17 < peterv> right
13:17 < humph> I think I need it to live the lifetime of the event
13:18 < humph> which is why I was holding onto the runtime, and killing it off in the dtor
13:18 < peterv> can script get at the array?
13:18 < humph> yes, it's meant for script (dom event object)
13:18 < peterv> hmm
13:19 < humph> what if I use JS_AddValueRoot and hold the jscontext?
13:19 < peterv> going to need cycle collection then
13:20 < peterv> humph: my advice would be to implement cycle collection for the event
13:21 < humph> how does one do that?
13:21 < peterv> humph: and use NS_HOLD_JS_OBJECTS/NS_DROP_JS_OBJECTS
13:22 < peterv> humph: I think nsJSScriptTimeoutHandler does exactly that
13:23 < peterv> humph: you need at least http://mxr.mozilla.org/mozilla-central/source/dom/base/nsJSTimeoutHandler.cpp#116
13:23 < peterv> humph: but its more complicated since you inherit from nsDOMEvent
13:23 < humph> of course it is :)
13:25 < peterv> you wouldn't need the runtime, but mCachedArray should be a JSObject*
13:28 < humph> peterv: OK, I understand only a bit of what you've said.  I'll start reading through this stuff
13:28 < humph> peterv: thanks for your help
13:28 < humph> I wish I knew what else I need to ask
13:28 < peterv> humph: you need to add NS_DECL_CYCLE_COLLECTION_SCRIPT_HOLDER_CLASS_AMBIGUOUS(...) to your class 
13:29 < humph> OK
13:29 < peterv> humph: and a copy of that block I pointed you too
13:29 < peterv> humph: when you create your array you need to call HOLD_JS_OBJECTS
13:30 < peterv> humph: nsJSScriptTimeoutHandler should get you somewhat on the way
13:30 < peterv> humph: I'll try to figure out what you need to do differently since you inherit from nsDOMEvent
This cycle collector stuff has beaten me.  Here's what I have, followed by my link error, if anyone can show me the folly of my ways:

nsDOMNotifyAudioAvailableEvent.h:
---------------------------------

NS_DECL_CYCLE_COLLECTING_ISUPPORTS

NS_DECL_CYCLE_COLLECTION_CLASS_AMBIGUOUS(nsDOMNotifyAudioAvailableEvent, nsIDOMNotifyAudioAvailableEvent)

NS_DECL_NSIDOMNOTIFYAUDIOAVAILABLEEVENT

NS_FORWARD_TO_NSDOMEVENT



nsDOMNotifyAudioAvailableEvent.cpp:
-----------------------------------

NS_IMPL_CYCLE_COLLECTION_CLASS(nsDOMNotifyAudioAvailableEvent)

DOMCI_DATA(NotifyAudioAvailableEvent, nsDOMNotifyAudioAvailableEvent)

NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION(nsDOMNotifyAudioAvailableEvent)
  NS_INTERFACE_MAP_ENTRY(nsIDOMNotifyAudioAvailableEvent)
  NS_INTERFACE_MAP_ENTRY_AMBIGUOUS(nsISupports, nsIDOMNotifyAudioAvailableEvent)
  NS_DOM_INTERFACE_MAP_ENTRY_CLASSINFO(NotifyAudioAvailableEvent)
NS_INTERFACE_MAP_END_INHERITING(nsDOMEvent)

NS_IMPL_CYCLE_COLLECTING_ADDREF_AMBIGUOUS(nsDOMNotifyAudioAvailableEvent, nsDOMEvent)
NS_IMPL_CYCLE_COLLECTING_RELEASE_AMBIGUOUS(nsDOMNotifyAudioAvailableEvent, nsDOMEvent)



Build error:
------------

c++  -fno-rtti -fno-exceptions -Wall -Wpointer-arith -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno-invalid-offsetof -Wno-variadic-macros -Werror=return-type -pedantic -Wno-long-long -fno-strict-aliasing -fshort-wchar -pthread -pipe  -DNDEBUG -DTRIMMED -Os -freorder-blocks -fomit-frame-pointer  -fPIC -shared -Wl,-z,defs -Wl,-h,libxul.so -o libxul.so  nsStaticXULComponents.o nsUnicharUtils.o nsBidiUtils.o nsRDFResource.o     -lpthread   -Wl,-rpath-link,/home/dave/moz/audio15g/objdir-release/dist/bin -Wl,-rpath-link,/usr/local/lib  -Wl,--whole-archive ../../embedding/browser/gtk/src/libgtkembedmoz.a ../../toolkit/xre/libxulapp_s.a  ../../staticlib/components/libnecko.a ../../staticlib/components/libuconv.a ../../staticlib/components/libi18n.a ../../staticlib/components/libchardet.a ../../staticlib/components/libjar50.a ../../staticlib/components/libpref.a ../../staticlib/components/libhtmlpars.a ../../staticlib/components/libimglib2.a ../../staticlib/components/libgklayout.a ../../staticlib/components/libdocshell.a ../../staticlib/components/libembedcomponents.a ../../staticlib/components/libwebbrwsr.a ../../staticlib/components/libnsappshell.a ../../staticlib/components/libtxmgr.a ../../staticlib/components/libcommandlines.a ../../staticlib/components/libextensions.a ../../staticlib/components/libtoolkitcomps.a ../../staticlib/components/libpipboot.a ../../staticlib/components/libpipnss.a ../../staticlib/components/libappcomps.a ../../staticlib/components/libjetpack_s.a ../../staticlib/components/libjsctypes.a ../../staticlib/components/libgkplugin.a ../../staticlib/components/libunixproxy.a ../../staticlib/components/libjsd.a ../../staticlib/components/libautoconfig.a ../../staticlib/components/libauth.a ../../staticlib/components/libcookie.a ../../staticlib/components/libpermissions.a ../../staticlib/components/libuniversalchardet.a ../../staticlib/components/libcomposer.a ../../staticlib/components/librdf.a ../../staticlib/components/libwindowds.a ../../staticlib/components/libfileview.a ../../staticlib/components/libstoragecomps.a ../../staticlib/components/libplaces.a ../../staticlib/components/libtkautocomplete.a ../../staticlib/components/libsatchel.a ../../staticlib/components/libpippki.a ../../staticlib/components/libwidget_gtk2.a ../../staticlib/components/libsystem-pref.a ../../staticlib/components/libimgicon.a ../../staticlib/components/libgkgfxthebes.a ../../staticlib/components/libaccessibility.a ../../staticlib/components/libremoteservice.a ../../staticlib/components/libspellchecker.a ../../staticlib/components/libzipwriter.a ../../staticlib/libjsipc_s.a ../../staticlib/libdomipc_s.a ../../staticlib/libdomplugins_s.a ../../staticlib/libmozipc_s.a ../../staticlib/libmozipdlgen_s.a ../../staticlib/libipcshell_s.a ../../staticlib/libgfxipc_s.a ../../staticlib/libxpcom_core.a ../../staticlib/libucvutil_s.a ../../staticlib/libgkgfx.a ../../staticlib/libchromium_s.a ../../staticlib/libmozreg_s.a ../../staticlib/libmorkreader_s.a ../../staticlib/libgtkxtbin.a ../../staticlib/libthebes.a ../../staticlib/libycbcr.a  -Wl,--no-whole-archive -L../../dist/lib -lmozsqlite3 -L../../dist/bin -L../../dist/lib -L../../jpeg -lmozjpeg -L../../modules/libimg/png -lmozpng ../../gfx/qcms/libmozqcms.a -L/home/dave/moz/audio15g/objdir-release/dist/bin -lmozjs -L../../dist/bin -L../../dist/lib -lcrmf -lsmime3 -lssl3 -lnss3 -lnssutil3 ../../gfx/cairo/cairo/src/libmozcairo.a ../../gfx/cairo/libpixman/src/libmozlibpixman.a   -lXrender -lfreetype -lfontconfig ../../gfx/harfbuzz/src/libmozharfbuzz.a  -L../../modules/zlib/src -lmozz -lasound   -lrt -L../../dist/bin -L../../dist/lib  -L/home/dave/moz/audio15g/objdir-release/dist/lib -lplds4 -lplc4 -lnspr4 -lpthread -ldl -L../../dist/bin -lmozalloc -L/lib64 -ldbus-1 -lpthread -lrt   -pthread -L/lib64 -ldbus-glib-1 -ldbus-1 -lpthread -lgobject-2.0 -lgthread-2.0 -lrt -lglib-2.0    -lX11  -lXext  -pthread -lpangoft2-1.0 -lpango-1.0 -lfreetype -lfontconfig -lgobject-2.0 -lgmodule-2.0 -lgthread-2.0 -lrt -lglib-2.0   -pthread -lgtk-x11-2.0 -latk-1.0 -lgio-2.0 -lpangoft2-1.0 -lfreetype -lfontconfig -lgdk-x11-2.0 -lgdk_pixbuf-2.0 -lpangocairo-1.0 -lpango-1.0 -lcairo -lgobject-2.0 -lgmodule-2.0 -lgthread-2.0 -lrt -lglib-2.0   -lXt -lgthread-2.0 -lfreetype -ldl -lm  -lrt    
../../staticlib/components/libgklayout.a(nsDOMNotifyAudioAvailableEvent.o): In function `global constructors keyed to nsDOMNotifyAudioAvailableEvent.cpp':
nsDOMNotifyAudioAvailableEvent.cpp:(.text+0x11): undefined reference to `vtable for nsDOMNotifyAudioAvailableEvent::cycleCollection'
/usr/bin/ld: ../../staticlib/components/libgklayout.a(nsDOMNotifyAudioAvailableEvent.o): relocation R_X86_64_PC32 against undefined hidden symbol `vtable for nsDOMNotifyAudioAvailableEvent::cycleCollection' can not be used when making a shared object
/usr/bin/ld: final link failed: Bad value
collect2: ld returned 1 exit status
Attached patch Patch (v. 15i) (obsolete) — Splinter Review
This removes the explicit use of JSRuntime by fixing the cycle collector issues above, removes the unnecessary atom, adds a check for MozAudioAvailable listeners before dispatching events, fixes some minor compiler warnings with initialization order, and updates to trunk (eb480da7ffd5).
Attachment #459302 - Attachment is obsolete: true
Attachment #459990 - Flags: review?(kinetik)
Attachment #459990 - Flags: review?(Olli.Pettay)
Attachment #459302 - Flags: review?(kinetik)
Attachment #459302 - Flags: review?(Olli.Pettay)
how do I attach an event handler for mozAudioAvailable? I've tried 
<audio src="song.ogg"
  onloadedmetadata="loadedMetadata(event);"
  onaudiowritten="audioWritten(event);"
  onmozaudioavailable="audioWritten(event);"
  onaudioavailable="audioWritten(event);"
  id="song"
  autoplay="true">
</audio>
and the handler just doesn't seem to fire. Any suggestions as to what I might be doing wrong?
Tim,

See https://wiki.mozilla.org/Audio_Data_API_Review_Version#API_Tutorial, but basically:

<audio src="song.ogg" onmozaudioavailable="audioAvailable(event);"></audio>
...
function audioAvailable(event) {
  // event.frameBuffer or event.time
}

I made some try-server builds of the latest patch on trunk, and you can grab the here:

http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/david.humphrey@senecac.on.ca-d886b29dcd0b/
my code worked perfectly before the most recent change, but now the handler does not even get called. As i said before, i have tried the attributes on the audio tag but they do not work. I also tried audioElement.addEventListener("mozaudioavailable",audioAvailable,false);
I also tested it on mac, windows xp and linux.
with this code, 
<audio src="song.ogg" onmozaudioavailable="audioAvailable(event);"  onaudiowritten="audioAvailable(event)" onended="alert(availCalled)" autoplay="true" controls="true"></audio>
<script>
var availCalled=false;
function audioAvailable(event) {
availCalled=true;
}
</script>

when the song ends, it alerts false in the most recent builds, whereas in the not so recent builds, it alerts true
Without seeing your full code, I'm not sure (maybe you're hitting the new same-origin check, which means no events get fired?).  Probably best to find me on irc so we can discuss your case, the code itself works (our tests all pass, our demos work).
FYI - Corban and Yury have the start of a JS based implementation of the proposed node-based API being discussed on the audio working group list.  They are doing it on top of this API, and it's able to do quite a bit already.  See: 

http://github.com/corbanbrook/audionode.js
RE: "handling for cross-origin media"...

No errors are thrown when I attempt to get data cross origin. When I try to do the same with the Canvas, I get a security error 1000 I believe? As a JavaScript developer who uses the browser to debug, this could potentially be extremely confusing/time-consuming.

Does anybody know if there are plans to include security errors for cross origin handling?
This is part of the current patch up for review, so I'm still waiting to hear what the recommended way is.  When I wrote this, I was told to consider either a) not dispatching the event; b) throw when event.frameBuffer is called.  I opted for a), but could do b) instead.

I'll wait for the rest of the review comments on this version before I change it.
I put the file up at http://www.timando.net/audio/fft.html If you go there, the demo works, but if you download http://www.timando.net/audio.zip and extract it, the demo stops working. Any ideas?
(In reply to comment #119)
> I put the file up at http://www.timando.net/audio/fft.html If you go there, the
> demo works, but if you download http://www.timando.net/audio.zip and extract
> it, the demo stops working. Any ideas?

If it's not working on local files you possibly may need to set security.fileuri.strict_origin_policy to false in about:config - keeping in mind the security implications of this.
(In reply to comment #118)
> This is part of the current patch up for review, so I'm still waiting to hear
> what the recommended way is.  When I wrote this, I was told to consider either
> a) not dispatching the event; b) throw when event.frameBuffer is called.  I
> opted for a), but could do b) instead.
> 
> I'll wait for the rest of the review comments on this version before I change
> it.

I'd recommend throwing an error. That way it's easier to see that there is a problem rather than wondering why your handler doesn't fire.
Comment on attachment 459990 [details] [diff] [review]
Patch (v. 15i)

>+GK_ATOM(onmozaudioavailable, "onmozaudioavailable")
onMozAudioAvailable

> #ifdef MOZ_MEDIA
>   "loadstart", "progress", "suspend", "emptied", "stalled", "play", "pause",
>   "loadedmetadata", "loadeddata", "waiting", "playing", "canplay",
>   "canplaythrough", "seeking", "seeked", "timeupdate", "ended", "ratechange",
>-  "durationchange", "volumechange",
>+  "durationchange", "volumechange", "mozaudioavailable",
MozAudioAvailable



>+NS_IMPL_CYCLE_COLLECTION_TRACE_BEGIN(nsDOMNotifyAudioAvailableEvent)
>+  NS_IMPL_CYCLE_COLLECTION_TRACE_JS_MEMBER_CALLBACK(mCachedArray)
>+NS_IMPL_CYCLE_COLLECTION_TRACE_END
I wish Peterv could review mCachedArray handling

>+private:
>+  nsAutoArrayPtr<float> mFrameBuffer;
>+  PRUint32 mFrameBufferLength;
>+  float mTime;
>+  JSObject* mCachedArray;
>+  PRPackedBool mCachedArrayInitialized;
Since there is only one boolean member, you could use PRBool


>+++ b/content/events/src/nsEventListenerManager.cpp
>@@ -486,16 +486,21 @@ nsEventListenerManager::AddEventListener
>       window->SetMutationListeners((aType == NS_MUTATION_SUBTREEMODIFIED) ?
>                                    kAllMutationBits :
>                                    MutationBitForEventType(aType));
>     }
>   } else if (aTypeAtom == nsGkAtoms::onMozOrientation) {
>     nsPIDOMWindow* window = GetInnerWindowForTarget();
>     if (window)
>       window->SetHasOrientationEventListener();
>+  } else if (aType == NS_MOZAUDIOAVAILABLE) {
>+    nsPIDOMWindow* window = GetInnerWindowForTarget();
>+    if (window) {
>+      window->SetHasAudioAvailableListeners();
>+    }
Note, you need to update also nsNodeUtils, so that when an audio element
is moved from one document to another, the events are still fired correctly.
And please add a testcase for that.

>+PRBool nsHTMLMediaElement::HasAudioAvailableListeners()
>+{
>+  // Determine if the current element is focused, if it is not focused
>+  // then we should not try to blur
>+  nsIDocument *document = GetDocument();
>+  if (!document) {
>+    return PR_FALSE;
>+  }
>+
>+  nsCOMPtr<nsPIDOMWindow> window = do_QueryInterface(document->GetInnerWindow());
No need for nsCOMPtr, nsPIDOMWindow* is good enough. (No need for QI either)

>+  if (!window) {
>+    return PR_FALSE;
>+  }
>+
>+  return window->HasAudioAvailableListeners();
You could write this
  return window && window->HasAudioAvailableListeners();


>+NS_IMETHODIMP
>+nsHTMLMediaElement::GetMozChannels(PRUint32 *aMozChannels)
>+{
>+   if (mHasAudioStream)
>+     *aMozChannels = mChannels;
>+   else
>+     *aMozChannels = mDecoder ? mChannels : 0;
>+
>+   return NS_OK;
>+}
Strange 3 space indentation

>+  NS_IMETHOD Run() {
'{' should be in the next line

>+    mDecoder->AudioAvailable(mFrameBuffer.forget(), mFrameBufferLength, mTime);
>+    return NS_OK;
>+  }
>+  const PRUint32 mFrameBufferLength;
>+  const PRUint64 mTime;
>+};


>+void nsAudioAvailableEventManager::QueueWrittenAudioData(float* aAudioData,
>+                                                         PRUint32 aAudioDataLength,
>+                                                         PRUint64 aAudioEndTime)
>+{
>+  PRUint32 currentBufferSize = mDecoder->GetFrameBufferLength();
>+  if (!mSignalBuffer ||
>+      (mSignalBufferPosition == 0 && mSignalBufferLength != currentBufferSize)) {
>+    if (!mSignalBuffer || (mSignalBufferLength < currentBufferSize)) {
>+      // Only resize if buffer is empty or smaller.
>+      mSignalBuffer = new float[currentBufferSize];
>+    }
>+    mSignalBufferLength = currentBufferSize;
>+  }
>+
>+  float *  audioData = aAudioData;
Extra space before and after *

>+  NS_IMETHOD Run() {
{ should be in the next line
>+    mDecoder->MetadataLoaded(mChannels, mRate, mFrameBufferLength);
>+    return NS_OK;
>+  }


>+function audioEnded() {
>+  // Give a second for any data/events to clear before testing.
>+  setTimeout(checkResults, 1000);
>+}
This looks error prone. Could there be any better way to check this?
Timouts tend to cause random test failures. (There can be cycle collection happening or
the tinderbox machine can be busy doing something else, so not so much is perhaps done in a second.) 


>+interface nsIDOMNotifyAudioAvailableEvent : nsIDOMEvent
>+{
>+  [implicit_jscontext]
>+  readonly attribute jsval  frameBuffer;
>+
>+  readonly attribute float  time;
>+
>+  [noscript] void initAudioAvailableEvent(in DOMString typeArg,
>+                                          in boolean canBubbleArg,
>+                                          in boolean cancelableArg,
>+                                          [array, size_is(frameBufferLength)] in float frameBufferPtr,
>+                                          in unsigned long frameBufferLength,
>+                                          in float time);
Why isn't this scriptable?

r- mainly because of nsNodeUtils thing. And I'm not familiar with all the audio handling code, so
kinetik should review those.
Attachment #459990 - Flags: review?(Olli.Pettay) → review-
Comment on attachment 459990 [details] [diff] [review]
Patch (v. 15i)

I'll take a look at the CC code (probably tomorrow).
Attachment #459990 - Flags: review?(peterv)
Olli, thanks for going over this so carefully.  I'm going to work-up a new patch now based on your feedback.

Peter, that would be great, thanks a lot.

Matt, have you got an ETA for review comments on the audio bits?  It's been a while and I'm really hoping to get this into ff4.  If you can't get this done soon, let me know so I can switch to someone else.
(In reply to comment #122)
> >+GK_ATOM(onmozaudioavailable, "onmozaudioavailable")
> onMozAudioAvailable

I can't find anyone on irc who can answer this, so I'll ask here.  Which one do I do?

GK_ATOM(onMozAudioAvailable, "onMozAudioAvailable")

or

GK_ATOM(onmozaudioavailable, "onMozAudioAvailable")
Attached patch Patch (v. 15j) (obsolete) — Splinter Review
This fixes the review comments above.  I've also changed things so you can get events for an Audio() that isn't inserted in the DOM (e.g., adding an event listener to an Audio made with script), and changed the origin check to throw when accessing event.frameBuffer instead of not dispatching (and therefore failing silently).

You also say to write a test case for "when an audio element is moved from one document to another [insuring] the events are still fired correctly."  I don't know what you mean here enough to write this test, if you could give me more details.
Attachment #459990 - Attachment is obsolete: true
Attachment #461330 - Flags: review?(peterv)
Attachment #461330 - Flags: review?(kinetik)
Attachment #461330 - Flags: review?(Olli.Pettay)
Attachment #459990 - Flags: review?(peterv)
Attachment #459990 - Flags: review?(kinetik)
(In reply to comment #126)
> You also say to write a test case for "when an audio element is moved from one
> document to another [insuring] the events are still fired correctly."  I don't
> know what you mean here enough to write this test, if you could give me more
> details.
If you have 2 documents, and first one has <audio> and then you do something like
theOtherDocument.adoptNode(audioElement);
Comment on attachment 461330 [details] [diff] [review]
Patch (v. 15j)

>diff --git a/content/base/src/nsGkAtomList.h b/content/base/src/nsGkAtomList.h
>--- a/content/base/src/nsGkAtomList.h
>+++ b/content/base/src/nsGkAtomList.h
>@@ -1645,16 +1645,17 @@ GK_ATOM(oncanplay, "oncanplay")
> GK_ATOM(oncanplaythrough, "oncanplaythrough")
> GK_ATOM(onseeking, "onseeking")
> GK_ATOM(onseeked, "onseeked")
> GK_ATOM(ontimeupdate, "ontimeupdate")
> GK_ATOM(onended, "onended")
> GK_ATOM(onratechange, "onratechange")
> GK_ATOM(ondurationchange, "ondurationchange")
> GK_ATOM(onvolumechange, "onvolumechange")
>+GK_ATOM(onMozAudioAvailable, "onmozaudioavailable")
Ok, I saw in another bug that there is a problem to actually handle
onMozAudioAvailable attribute. (A DOM 3 Events or HTML5 spec issue)
Could we just not support onMozAudioAvailable attribute on the elements for now?
.addEventListener fits in so much better to adding experimental events.

>diff --git a/content/base/src/nsContentUtils.cpp b/content/base/src/nsContentUtils.cpp
>--- a/content/base/src/nsContentUtils.cpp
>+++ b/content/base/src/nsContentUtils.cpp
>@@ -596,16 +596,17 @@ nsContentUtils::InitializeEventTable() {
>     { nsGkAtoms::oncanplaythrough,              NS_CANPLAYTHROUGH, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::onseeking,                     NS_SEEKING, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::onseeked,                      NS_SEEKED, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::ontimeupdate,                  NS_TIMEUPDATE, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::onended,                       NS_ENDED, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::onratechange,                  NS_RATECHANGE, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::ondurationchange,              NS_DURATIONCHANGE, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::onvolumechange,                NS_VOLUMECHANGE, EventNameType_HTML, NS_EVENT_NULL },
>+    { nsGkAtoms::onMozAudioAvailable,           NS_MOZAUDIOAVAILABLE, EventNameType_HTML, NS_MOZAUDIOAVAILABLE },

In that case EventNameType_HTML -> EventNameType_None

I'd still like to see tests for moving <audio> from one document to another.
You could for example have an iframe in the test and move <audio> from the main document to the
iframe.contentDocument.
Attachment #461330 - Flags: review?(Olli.Pettay) → review-
Attached patch Patch (v. 16a) (obsolete) — Splinter Review
This updates to trunk (adjustments for scroll events and fatvals landing), removes support for the onmozaudioavailable attribute (one must use addEventListener("MozAudioAvailable"...) now), switches EventNameType_HTML -> EventNameType_None, adds NS_MOZAUDIOAVAILABLE_EVENT to match style of other events, and adds a test for moving <audio> from one document to another.
Attachment #461330 - Attachment is obsolete: true
Attachment #462156 - Flags: review?(peterv)
Attachment #462156 - Flags: review?(kinetik)
Attachment #462156 - Flags: review?(Olli.Pettay)
Attachment #461330 - Flags: review?(peterv)
Attachment #461330 - Flags: review?(kinetik)
Comment on attachment 462156 [details] [diff] [review]
Patch (v. 16a)

>diff --git a/content/base/src/nsContentUtils.cpp b/content/base/src/nsContentUtils.cpp
>--- a/content/base/src/nsContentUtils.cpp
>+++ b/content/base/src/nsContentUtils.cpp
>@@ -599,16 +599,17 @@ nsContentUtils::InitializeEventTable() {
>     { nsGkAtoms::oncanplaythrough,              NS_CANPLAYTHROUGH, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::onseeking,                     NS_SEEKING, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::onseeked,                      NS_SEEKED, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::ontimeupdate,                  NS_TIMEUPDATE, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::onended,                       NS_ENDED, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::onratechange,                  NS_RATECHANGE, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::ondurationchange,              NS_DURATIONCHANGE, EventNameType_HTML, NS_EVENT_NULL },
>     { nsGkAtoms::onvolumechange,                NS_VOLUMECHANGE, EventNameType_HTML, NS_EVENT_NULL },
>+    { nsGkAtoms::onMozAudioAvailable,           NS_MOZAUDIOAVAILABLE, EventNameType_None, NS_MOZAUDIOAVAILABLE_EVENT },


>diff --git a/widget/public/nsGUIEvent.h b/widget/public/nsGUIEvent.h
>--- a/widget/public/nsGUIEvent.h
>+++ b/widget/public/nsGUIEvent.h
>@@ -110,16 +110,17 @@ class nsHashKey;
> #define NS_DRAG_EVENT                     35
> #define NS_NOTIFYPAINT_EVENT              36
> #define NS_SIMPLE_GESTURE_EVENT           37
> #define NS_SELECTION_EVENT                38
> #define NS_CONTENT_COMMAND_EVENT          39
> #define NS_GESTURENOTIFY_EVENT            40
> #define NS_UISTATECHANGE_EVENT            41
> #define NS_MOZTOUCH_EVENT                 42
>+#define NS_MOZAUDIOAVAILABLE_EVENT        43
Why this?
You're not adding a new "nsEvent" struct for audio events.
You should be able to use NS_EVENT_NULL for the struct type in the InitializeEventTable.
Attachment #462156 - Flags: review?(Olli.Pettay) → review+
Attached patch Patch (v. 16b) (obsolete) — Splinter Review
This addresses the NS_EVENT_NULL comment above, carrying forward the r+ for everything else.
Attachment #462235 - Flags: superreview?(vladimir)
Attachment #462156 - Flags: review?(kinetik)
Comment on attachment 462235 [details] [diff] [review]
Patch (v. 16b)

+  if (NS_FAILED(rv)) {
+    return rv;
+  }

Shutdown() and free mAudioStream?

+  mHasAudioStream = PR_TRUE;

Can you use a NULL test of mAudioStream for this instead of
maintaining extra state?  Could wrap it up in a HasAudioStream()
getter function.

+  PRUint32 available = mAudioStream->Available();
+  PRUint32 writeLen = dataLength;
+  if (writeLen > available) {
+    writeLen = available;
+  }

This can just be:

PRUint32 writeLen = NS_MIN(mAudioStream->Available(), dataLength);

+  nsresult rv = mAudioStream->Write((float*) tsrc->data, writeLen, PR_TRUE);

No need for a float* cast, Write() takes a const void*.  Also,
when casts are required the preferred form is constructor-style
casts for numeric types and static_cast for anything else.

+  if (mHasAudioStream)
+    *aMozChannels = mChannels;
+  else
+    *aMozChannels = mDecoder ? mChannels : 0;

If there's no audio stream and no decoder, should this return an error
rather than success with 0 channels?

Same question about GetMozSampleRate and GetMozFrameBufferLength.

+      // If a previous call to mozSetup() was made, kill that media stream
+      // in order to use this new src instead.
+      if (mAudioStream) {
+        mAudioStream->Shutdown();
+        mAudioStream = nsnull;
+        mHasAudioStream = PR_FALSE;
+      }

I don't think this handles the case where a new <source> element
is added as a child of this media element.

+    if (e->mTime <= aCurrentTime) {
+      nsCOMPtr<nsIRunnable> event = mPendingEvents[0];
+      mPendingEvents.RemoveElementAt(0);
+      NS_DispatchToMainThread(event, NS_DISPATCH_NORMAL);
+    } else {
+      break;
+    }

Reverse the test so that the break is early, then unintend the
rest of the logic.

+                    (1000 * (mSignalBufferPosition + audioDataLength)) /

Make 1000 a named constant so it's clear what unit conversion is
happening here.

+  nsBuiltinDecoder* mDecoder;
+  nsBuiltinDecoderReader* mReader;

Add a comment about the lifetime of these pointers with respect
to the event manager.  Also add comments about how the media
threads interact with the manager, and how/why the design is
thread safe.

+  PRUint32 currentBufferSize = mDecoder->GetFrameBufferLength();
+                    mReader->GetInfo().mAudioChannels / mReader->GetInfo().mAudioRate;

These can't change once they're set, so maybe pass the values in
when setting up the event manager so you don't need to hold
pointers to the reader?  You still need the decoder pointer for
the event, but it's easier to see that things are thread-safe if
it's only used for creating the event.

PRInt64 nsAudioStream::GetPosition()
 {
+  PRInt64 sampleOffset = GetSampleOffset();
+  if(sampleOffset >= 0)
+    return ((1000 * sampleOffset) / mRate / mChannels);
+  else
+    return -1;
+}

No need for the else, unindent the return.

+    return (position / sizeof(short));

Less parenthesis please.

+  if ( (aLength < FRAMEBUFFER_LENGTH_MIN) || (aLength > FRAMEBUFFER_LENGTH_MAX) ||
+      ((aLength & (aLength - 1)) > 0))

Too many parens here too.

+  // The framebuffer size to use for audioavailable events.
+  PRUint32 mFrameBufferLength;

This should be laid out before the packed bools.

Otherwise looks great, so r+ with those changes.
Attachment #462235 - Flags: review+
Comment on attachment 462235 [details] [diff] [review]
Patch (v. 16b)


> protected:
>   PRUint32 mMayHavePaintEventListener : 1;
>+  PRUint32 mMayHaveAudioAvailableEventListener : 1;
>   PRUint32 mMayHaveMutationListeners : 1;
>   PRUint32 mMayHaveCapturingListeners : 1;
>   PRUint32 mMayHaveSystemGroupListeners : 1;
>-  PRUint32 mNoListenerForEvent : 28;
>+  PRUint32 mNoListenerForEvent : 27;

Minor: move the new bit to just before NoListenerForEvent, ensuring that existing code doesn't change (also don't forget to change the initializer order).

Otherwise, looks fine with matthew's suggested changes!
Attachment #462235 - Flags: superreview?(vladimir) → superreview+
Depends on: 586387
Thanks Matt and Vlad for your thorough reviews.  This patch has your comments addressed.  This would be ready to go in, but we found a regression in the typed Float32 Array (see bug 586387, which has an approved patch).  As soon as that goes in, this can too.

Requesting approval for 2.0 now that this has been through all the reviews.  This patch introduces a Mozilla-specific API for audio and video media elements, and has been tested and refined over many months.  It does what it does without sacrificing the current features of the media elements, adding much requested functionality for web developers.
Attachment #465105 - Flags: approval2.0?
Final patch doesn't apply cleanly any more -- would like to land it for b4 today, someone want to do the merge?
Comment on attachment 465105 [details] [diff] [review]
Patch (v. 16c) r+ and sr+ and review fixes.

>diff --git a/content/events/src/nsDOMNotifyAudioAvailableEvent.cpp b/content/events/src/nsDOMNotifyAudioAvailableEvent.cpp

>+NS_IMPL_CYCLE_COLLECTION_CLASS(nsDOMNotifyAudioAvailableEvent)
>+NS_IMPL_CYCLE_COLLECTION_ROOT_BEGIN(nsDOMNotifyAudioAvailableEvent)

This isn't right, you need to call NS_DROP_JS_OBJECTS and set mCachedArray to null.

>+NS_IMPL_CYCLE_COLLECTION_ROOT_END
>+NS_IMPL_CYCLE_COLLECTION_UNLINK_0(nsDOMNotifyAudioAvailableEvent)
>+NS_IMPL_CYCLE_COLLECTION_TRAVERSE_BEGIN(nsDOMNotifyAudioAvailableEvent)
>+NS_IMPL_CYCLE_COLLECTION_TRAVERSE_END

The problem here is that you need to forward to nsDOMEvent's CC code. You also need to traverse the script objects. This should probably be:

NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN_INHERITED(nsDOMNotifyAudioAvailableEvent, nsDOMEvent)
NS_IMPL_CYCLE_COLLECTION_UNLINK_END
NS_IMPL_CYCLE_COLLECTION_TRAVERSE_BEGIN_INHERITED(nsDOMNotifyAudioAvailableEvent, nsDOMEvent)
  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_SCRIPT_OBJECTS
NS_IMPL_CYCLE_COLLECTION_TRAVERSE_END

>+NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION(nsDOMNotifyAudioAvailableEvent)

NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION_INHERITED

>+  NS_INTERFACE_MAP_ENTRY(nsIDOMNotifyAudioAvailableEvent)
>+  NS_DOM_INTERFACE_MAP_ENTRY_CLASSINFO(NotifyAudioAvailableEvent)
>+  NS_INTERFACE_MAP_ENTRY_AMBIGUOUS(nsISupports, nsIDOMNotifyAudioAvailableEvent)
>+NS_INTERFACE_MAP_END_INHERITING(nsDOMEvent)
>+
>+NS_IMPL_CYCLE_COLLECTING_ADDREF_AMBIGUOUS(nsDOMNotifyAudioAvailableEvent, nsDOMEvent)
>+NS_IMPL_CYCLE_COLLECTING_RELEASE_AMBIGUOUS(nsDOMNotifyAudioAvailableEvent, nsDOMEvent)


NS_IMPL_ADDREF_INHERITED(nsDOMNotifyAudioAvailableEvent, nsDOMEvent)
NS_IMPL_RELEASE_INHERITED(nsDOMNotifyAudioAvailableEvent, nsDOMEvent)

>+NS_IMETHODIMP
>+nsDOMNotifyAudioAvailableEvent::GetFrameBuffer(JSContext* aCx, jsval* aResult)

>+  if (!mCachedArrayInitialized) {
>+    // Cache this array so we don't recreate on next call.
>+    NS_HOLD_JS_OBJECTS(this, nsDOMNotifyAudioAvailableEvent);
>+
>+    mCachedArray = js_CreateTypedArray(aCx, js::TypedArray::TYPE_FLOAT32, mFrameBufferLength);
>+    if (!mCachedArray) {
>+      NS_ERROR("Failed to get audio signal!");
>+      return NS_ERROR_FAILURE;
>+    }
>+
>+    mCachedArrayInitialized = PR_TRUE;

I think you don't really need mCachedArrayInitialized. Just call NS_DROP_JS_OBJECTS if js_CreateTypedArray returns null and check mCachedArray wherever you check mCachedArrayInitialized.

>diff --git a/content/events/src/nsDOMNotifyAudioAvailableEvent.h b/content/events/src/nsDOMNotifyAudioAvailableEvent.h

>+class nsDOMNotifyAudioAvailableEvent : public nsDOMEvent,
>+                                       public nsIDOMNotifyAudioAvailableEvent
>+{
>+public:
>+  nsDOMNotifyAudioAvailableEvent(nsPresContext* aPresContext, nsEvent* aEvent,
>+                                 PRUint32 aEventType, float * aFrameBuffer,
>+                                 PRUint32 aFrameBufferLength, float aTime);
>+
>+  NS_DECL_CYCLE_COLLECTING_ISUPPORTS

This should be NS_DECL_ISUPPORTS_INHERITED.
Thanks, Peter.

> >+NS_IMPL_CYCLE_COLLECTION_CLASS(nsDOMNotifyAudioAvailableEvent)
> >+NS_IMPL_CYCLE_COLLECTION_ROOT_BEGIN(nsDOMNotifyAudioAvailableEvent)
> 
> This isn't right, you need to call NS_DROP_JS_OBJECTS and set mCachedArray to
> null.

I'm confused by "this"--are you saying those lines aren't right?  This does call NS_DROP_JS_OBJECTS in the dtor:

+nsDOMNotifyAudioAvailableEvent::~nsDOMNotifyAudioAvailableEvent()
+{
+  if (mCachedArrayInitialized) {
+    NS_DROP_JS_OBJECTS(this, nsDOMNotifyAudioAvailableEvent);
+  }
+}
Peter, can you have a quick look at this and see if it does what you want?  This also fixes the audio back-end to accommodate bug 576539 landing.
(In reply to comment #137)
> Thanks, Peter.
> 
> > >+NS_IMPL_CYCLE_COLLECTION_CLASS(nsDOMNotifyAudioAvailableEvent)
> > >+NS_IMPL_CYCLE_COLLECTION_ROOT_BEGIN(nsDOMNotifyAudioAvailableEvent)
> > 
> > This isn't right, you need to call NS_DROP_JS_OBJECTS and set mCachedArray to
> > null.
> 
> I'm confused by "this"--are you saying those lines aren't right?  This does
> call NS_DROP_JS_OBJECTS in the dtor:

Sorry, I meant having nothing between ROOT_BEGIN and ROOT_END. The code in the destructor could be irrelevant if there's a cycle, if you don't unlink we might never break the cycle, if we don't break the cycle the destructor doesn't run.
Comment on attachment 465946 [details] [diff] [review]
Patch (v. 16d) r+/sr+/fixes + bug 576539 and CC

Now that bug 586387 is landed, this is just waiting for a nod on the CC stuff.  Peter, as soon as you have a moment to look at this, let me know if we've got it so this too can land.
Attachment #465946 - Flags: review?(peterv)
Attachment #462156 - Flags: review?(peterv)
Comment on attachment 465946 [details] [diff] [review]
Patch (v. 16d) r+/sr+/fixes + bug 576539 and CC

See comment 139, you need a non-empty _ROOT_ hook. If you do exactly what I said in comment 136 I can r+.
Attachment #465946 - Flags: review?(peterv) → review-
Fixed a performance issue I missed with the previous fixes for bug 576539 (Chris P., this is what I was discussing with you on irc today, and the fix was minor).

Spent the rest of today trying to get the cycle collector stuff right as per comments above, and tried many, many things.  It's clear that I'm not going to stumble into the answer, and I need some more, and clear, guidance.  Lots of things seem like good ideas, but lead down unbuildable paths or to crashes.

The code in this patch, which is the same as previous, works.  Doing what you ask above, we hit various issues:

1) If we use INHERITED with _ROOT_ it won't build:

NS_IMPL_CYCLE_COLLECTION_ROOT_BEGIN(nsDOMNotifyAudioAvailableEvent)
  if (tmp->mCachedArray) {
    NS_DROP_JS_OBJECTS(tmp, nsDOMNotifyAudioAvailableEvent);
    tmp->mCachedArray = nsnull;
  }
NS_IMPL_CYCLE_COLLECTION_ROOT_END

...

/Users/dave/moz/audio16d/content/events/src/nsDOMNotifyAudioAvailableEvent.cpp:66: error: no ‘nsresult nsDOMNotifyAudioAvailableEvent::cycleCollection::RootAndUnlinkJSObjects(void*)’ member function declared in class ‘nsDOMNotifyAudioAvailableEvent::cycleCollection’


2) To get ROOT+INHERITED to compile, we have to do the following, which crashes after 10 seconds:

nsDOMNotifyAudioAvailableEvent.h
------------------------------------------------------------------------------

#define NS_DECL_CYCLE_COLLECTION_SCRIPT_HOLDER_CLASS_INHERITED(_class, _base_class)  \
class NS_CYCLE_COLLECTION_INNERCLASS                                                 \
 : public NS_CYCLE_COLLECTION_CLASSNAME(_base_class)                                 \
{                                                                                    \
  NS_IMETHOD RootAndUnlinkJSObjects(void *p);                                        \
  NS_IMETHOD_(void) Trace(void *p, TraceCallback cb, void *closure);                 \
public:                                                                              \
  NS_DECL_CYCLE_COLLECTION_CLASS_INHERITED_BODY(_class, _base_class)                 \
};                                                                                   \
NS_CYCLE_COLLECTION_PARTICIPANT_INSTANCE

class nsDOMNotifyAudioAvailableEvent : public nsDOMEvent,
                                       public nsIDOMNotifyAudioAvailableEvent
{
public:
  nsDOMNotifyAudioAvailableEvent(nsPresContext* aPresContext, nsEvent* aEvent,
                                 PRUint32 aEventType, float * aFrameBuffer,
                                 PRUint32 aFrameBufferLength, float aTime);

  NS_DECL_ISUPPORTS_INHERITED
  NS_DECL_CYCLE_COLLECTION_SCRIPT_HOLDER_CLASS_INHERITED(nsDOMNotifyAudioAvailableEvent, nsDOMEvent)

  NS_DECL_NSIDOMNOTIFYAUDIOAVAILABLEEVENT
  NS_FORWARD_NSIDOMEVENT(nsDOMEvent::)


nsDOMNotifyAudioAvailableEvent.cpp
-------------------------------------------------------------------------------

DOMCI_DATA(NotifyAudioAvailableEvent, nsDOMNotifyAudioAvailableEvent)

NS_IMPL_CYCLE_COLLECTION_CLASS(nsDOMNotifyAudioAvailableEvent)

NS_IMPL_ADDREF_INHERITED(nsDOMNotifyAudioAvailableEvent, nsDOMEvent)
NS_IMPL_RELEASE_INHERITED(nsDOMNotifyAudioAvailableEvent, nsDOMEvent)

NS_IMPL_CYCLE_COLLECTION_ROOT_BEGIN(nsDOMNotifyAudioAvailableEvent)
  if (tmp->mCachedArray) {
    NS_DROP_JS_OBJECTS(tmp, nsDOMNotifyAudioAvailableEvent);
    tmp->mCachedArray = nsnull;
  }
NS_IMPL_CYCLE_COLLECTION_ROOT_END

NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN_INHERITED(nsDOMNotifyAudioAvailableEvent, nsDOMEvent)
  {
    tmp->mCachedArray = nsnull;
  }
NS_IMPL_CYCLE_COLLECTION_UNLINK_END

NS_IMPL_CYCLE_COLLECTION_TRAVERSE_BEGIN_INHERITED(nsDOMNotifyAudioAvailableEvent, nsDOMEvent)
  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_SCRIPT_OBJECTS
NS_IMPL_CYCLE_COLLECTION_TRAVERSE_END

NS_IMPL_CYCLE_COLLECTION_TRACE_BEGIN(nsDOMNotifyAudioAvailableEvent)
  NS_IMPL_CYCLE_COLLECTION_TRACE_JS_MEMBER_CALLBACK(mCachedArray)
NS_IMPL_CYCLE_COLLECTION_TRACE_END

NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION_INHERITED(nsDOMNotifyAudioAvailableEvent)
  NS_INTERFACE_MAP_ENTRY(nsIDOMNotifyAudioAvailableEvent)
  NS_DOM_INTERFACE_MAP_ENTRY_CLASSINFO(NotifyAudioAvailableEvent)
NS_INTERFACE_MAP_END_INHERITING(nsDOMEvent)

nsDOMNotifyAudioAvailableEvent::~nsDOMNotifyAudioAvailableEvent()
{
}
Attachment #459428 - Attachment is obsolete: true
Attachment #459442 - Attachment is obsolete: true
Attachment #465946 - Attachment is obsolete: true
(In reply to comment #142)
> 1) If we use INHERITED with _ROOT_ it won't build:

Which INHERITED? Comment 136 didn't tell you to change to any of the NS_DECL_CYCLE_COLLECTION_CLASS_INHERITED* macros. AFAICT NS_DECL_CYCLE_COLLECTION_SCRIPT_HOLDER_CLASS should work fine for you.

> 2) To get ROOT+INHERITED to compile, we have to do the following, which crashes
> after 10 seconds:
> 
> nsDOMNotifyAudioAvailableEvent.h
> ------------------------------------------------------------------------------
> 
> #define NS_DECL_CYCLE_COLLECTION_SCRIPT_HOLDER_CLASS_INHERITED(_class,
> _base_class)  \

Again, I don't think this is needed. But I also don't see how this would crash. You didn't remove the NS_DROP_JS_OBJECTS from the destructor, did you? That could certainly cause you to crash.
(In reply to comment #143)
> > 2) To get ROOT+INHERITED to compile, we have to do the following,

Turns out we need to do this after all, I thought we could avoid it but apparently not. Next time, please just post the compile error you get instead of blindly adding/removing things :-). I could have pointed you in the right direction had I known what errors you get.
Attached patch Additional patch (obsolete) — Splinter Review
This makes attachment 466230 [details] [diff] [review] compile for me (and fixes an annoying compiler warning that shows up a lot). Please test and ask for review on the final patch if it works.
Matt/Chris, in addition to doing what Peter suggests in attachment 466270 [details] [diff] [review], this also adds a minor fix for event performance when seeking we discovered while testing, in case you want to take a look at nsBuiltinDecoderStateMachine and nsAudioAvailableEventManager again.
Attachment #466230 - Attachment is obsolete: true
Attachment #466270 - Attachment is obsolete: true
Attachment #466408 - Flags: review?(peterv)
Flagging 'dev-doc-needed' as per the recent request in Eric's blog--not needed yet, but will be.  We will update the docs (see https://wiki.mozilla.org/Audio_Data_API_Review_Version for the one we've been working from), but I assume this isn't a 1:1 copy/paste.
Keywords: dev-doc-needed
Updated documentation, main docs are now available at: https://wiki.mozilla.org/Audio_Data_API.  I will try to get people to update their demos linked there, too.
Comment on attachment 466408 [details] [diff] [review]
Patch (v. 16f) r+/sr+/fixes + CC fixed, seeking perf fixed

I just reviewed all the CC stuff, I hope you tested it and it didn't crash.
Attachment #466408 - Flags: review?(peterv) → review+
Thanks for turning this around so quickly, Peter.  Yes, we've tested it extensively, and the CC stuff is behaving very well.

Vlad, while we were testing and updating demos, we found an unrelated crash, which this patch fixes.  It relates to passing a non-array when an Array/Float32Array is expected.  I fixed it using the same method you use in putImageData, and Corban wrote an audio fuzzer to test other garbage--we can't crash it now.  I've also added a test for this.

The specific change is the second check below:

+NS_IMETHODIMP  
+nsHTMLAudioElement::MozWriteAudio(const jsval &aData, JSContext *aCx, PRUint32 *aRetVal)
+{              
+  if (!mAudioStream) {                                                     
+    return NS_ERROR_DOM_INVALID_STATE_ERR;
+  }           
+               
+  if (JSVAL_IS_PRIMITIVE(aData)) {                                                             
+    return NS_ERROR_DOM_TYPE_MISMATCH_ERR;
+  }

If you're OK with this, this is done and ready to get checked in.
Attachment #466679 - Flags: superreview?(vladimir)
Attachment #466679 - Flags: superreview?(vladimir) → superreview+
Keywords: checkin-needed
Attachment #466679 - Attachment description: Patch (v. 16g) r+/sr+/cc/seeking fixed crash in mozWriteAudio → Patch (v. 16g) r+/sr+/cc/seeking fixed crash in mozWriteAudio [checkin-needed]
Attachment #462156 - Attachment is obsolete: true
Attachment #462235 - Attachment is obsolete: true
Attachment #465105 - Attachment is obsolete: true
Attachment #466408 - Attachment is obsolete: true
Pushed to m-c:
http://hg.mozilla.org/mozilla-central/rev/1362f0ca86d2
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b5
I backed this out because it was causing orange in test_playback:
http://hg.mozilla.org/mozilla-central/rev/9ef027bf2120
http://hg.mozilla.org/mozilla-central/rev/eaa833618eaa
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I just grabbed a nightly (which appears to have been built after Ted pushed and before he reverted the patch): Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b5pre) Gecko/20100818 Minefield/4.0b5pre

It's very fun running the demos: https://wiki.mozilla.org/Audio_Data_API#Working_Audio_Data_Demos with a nightly FireFox.

Looking forward to seeing the patch land for good!
Spent this week tracking down the cause of the orange with this patch.  What originally looked like bug 557432, turns out to be an issue with the way we set the framebuffer length (channels * 1024) when using a 6 channel audio file (bug495794.ogg).  Our old code looked for a power of 2 length, and 6 * 1024 isn't, thus leaving the framebuffer length at 0 and causing an infinite loop in our event queue as it tries to loop over the audio data in order to chop it up.  This led to the timeouts (if the threads were killed) or OOM.

This patch delta fixes that, doing the following:

* Insure the number of events can't grow to infinity
* Insure that the audio metadata, and calculated framebuffer length, are valid
* Change the requirements on the framebuffer length, so any size between 512 and 16384 (reduced from 32K) is acceptable.  We still use channels * 1024 as the default.
* Add new tests for these cases.

Chris, since we've been discussing on irc, would you mind reviewing this?  I've done a try server build, and all tests were green.
Attachment #467989 - Flags: review?(chris)
Comment on attachment 467989 [details] [diff] [review]
Patch delta for testing_playback.html orange

Looks fine to me. I noted you've got a spelling mistake in a line added in your previous patch: "// are in non-decending sequence.". May as well fix it before you land it.

test_framebuffer.html and test_audio_event_adopt aren't backend independent. Can you please move those tests' Makefile entries down into the MOZ_OGG section at content/media/test/Makefile.in:235 (or make them backend independent)?
Attachment #467989 - Flags: review?(chris) → review+
Depends on: 589727
Comment on attachment 467989 [details] [diff] [review]
Patch delta for testing_playback.html orange

Thanks, Chris.  I'll post a roll-up patch that also includes your fixes as soon as bug 589727 is fixed.
Attachment #467989 - Flags: approval2.0?
Attachment #467989 - Attachment is obsolete: true
Attachment #466679 - Attachment is obsolete: true
Pushed to m-c (again):
http://hg.mozilla.org/mozilla-central/rev/081a707a76b8
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Depends on: 591847
Is there a reason currentSampleOffset is a method and not a readonly property?
No, it could be a readonly property.  If this is wrong/inefficient, I can fix if you file a new bug.
MiDi audio api anyone? This would be cool, but completely unnecessary, it would just help offload a bunch of JS load instead for the browser itself to do audio generation (generate frequencies, etc. for you, instead of you setting up arrays).
This could be something added to the spec @ http://chromium.googlecode.com/svn/trunk/samples/audio/specification/specification.html
Yeah, MIDI would be great. Please open a new bug for that.

Note http://chromium.googlecode.com/svn/trunk/samples/audio/specification/specification.html is not any kind of standard. It is just a proposal.
Similarly as https://wiki.mozilla.org/Audio_Data_API
Filed bug 615946 to add worker thread support for writing audio.
I'm interested in the prospect of developing a web based audio editing app: Load an audio file, apply effects to it, cut and rearrange sections of the audio, etc.

As far as I can tell, this sort of use case is not yet feasible, since audio data can only be accessed during playback, in realtime.  Is there any plans to possibly allow random read access to the full audio data?

Failing that, it would be nice to have some way to process the whole audio data stream as fast as possible.  So the same existing MozAudioAvailable events would be used, but firing a new one as soon as the last one has finished.  I'm thinking this sort of action could be initiated through a new .process() method on nsIDOMHTMLAudioElement that would be analogous to .play(), only the audio would not be sent to the speaker and only would be used for processing over MozAudioAvailable events.

Also are there any plans for making microphone input accessible?
> Also are there any plans for making microphone input accessible?

That's bug 591976.
(In reply to comment #164)
> As far as I can tell, this sort of use case is not yet feasible, since audio
> data can only be accessed during playback, in realtime.  Is there any plans to
> possibly allow random read access to the full audio data?

Please file a new bug for this, it sounds like it'd be a useful addition.
For generating audio, I think it would be highly desirable to be able to specify the size of the frame buffer (mozFrameBufferLength). When generating short sounds (e.g. clicks), a buffer length of several thousand samples would mean that time is needlessly wasted outputting 0's.

When generating audio, I would expect events to be dispatched at the onset and termination of playback. If output is guaranteed to begin immediately after the call to mozWriteAudio, then the first event isn't essential, but having an event to indicate when playback has finished is much more convenient than having to poll the audio object using a timeout.
Comment #22 asked about synchronizing audio to some visual output, I'm wondering about this too -- is there some "playback started" event that we can use to sync visual cues?

I've created a synthesizer & sequencer and can playback my tracks, but am having problems syncing visual feedback to audio events, the playback seems somewhat non-deterministic...

My ultimate goal is to make a real time webgl demo, but without the ability to sync events, I don't think I can do it.

--
Jeremy
Jeremy, please find us on irc.mozilla.org/audio, and we'll give you a hand.
(In reply to comment #162)
> Yeah, MIDI would be great. Please open a new bug for that.

Olli, any news about that? Do you think it's a good thing to start some work on MIDI right now or to wait to include that on other efforts like MediaStream API?
Depends on: 745272
Depends on: 745748
Depends on: 693067
Did not find a good place in Bugzilla for classifying a bug as relevant for the Audio API so I am placing a reference here to point out what I suspect is a garbage collection issue.

https://bugzilla.mozilla.org/show_bug.cgi?id=966847
You need to log in before you can comment on or make changes to this bug.