Closed Bug 598812 Opened 9 years ago Closed 9 years ago

in the audio data API the event.time returned by MozAudioAvailable event is inaccurate

Categories

(Core :: Audio/Video, defect, minor)

defect
Not set
minor

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- final+

People

(Reporter: silviapfeiffer1, Assigned: yury)

References

Details

Attachments

(3 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.8 Safari/534.7
Build Identifier: 

According to the documentation at https://wiki.mozilla.org/Audio_Data_API the MozAudioAvailable event returns and event.time value together with the event.framebuffer data. event.time is defined as:

The '''time''' attribute contains a float representing the time in seconds of the first sample in the '''frameBuffer''' array since the start of the audio track.

Therefore, that value should be equal to

 n * audio.mozFrameBufferLength/(audio.mozChannels*audio.mozSampleRate)

for an audio that has just been playing back without interruption, where n is the number of times that MozAudioAvailable got called before. 

However, in discussions with Dave Humphrey, it was clarified that currently event.time returns the start time of the decode data that relates to the framebuffer rather than the start time of the framebuffer itself. In all codecs this is slightly earlier than the actual framebuffer start offset, e.g. http://pastebin.mozilla.org/794693 .

To make the documentation correct, the implementation should actually return the offset of the framebuffer data where it relates to the decode time of the uncompressed data, because at that stage we do not care about what encoding format the data came in and what influence that has onto the offset.

Reproducible: Always
Status: UNCONFIRMED → ASSIGNED
Component: General → Video/Audio
Ever confirmed: true
Product: Firefox → Core
QA Contact: general → video.audio
Version: unspecified → Trunk
Changing this won't break the API at all, and should be a pretty easy patch.  I'll look into it.
Assignee: nobody → david.humphrey
I wonder if the accuracy of mozCurrentSampleOffset() is equally wrong because of the same issue. I have played back a file and recorded every framebuffer length I have given it and compared that to the reported sample offset and it is overtaking that half way through, see http://pastebin.mozilla.org/795142 .

There, too, we should deal with actual samples written to the audio device rather than samples decoded for writing to that buffer (I assume we are doing that because I can't otherwise explain how the sample offset can overtake the number of samples written).
And while we're at it, I'd like add that there is an example in the docs at https://wiki.mozilla.org/Audio_Data_API that surely hasn't been actually implemented. It's the one that describes the use of mozCurrentSampleOffset. 

It uses:
   var frameBuffer = event.mozFrameBuffer;
which should read
  var frameBuffer = event.frameBuffer;

And it uses:
     buffer.concat(audio);
to try and concatenate Float32Arrays, which is unfortunately an undefined function. Instead you need to do:
        // If there's buffered data, write that first
        buffer2 = new Float32Array(buffer.length + audio.length);
        i = 0;
        for (i = 0; i < buffer.length; i++) {
          buffer2[i] = buffer[i];
        }
        for (j = 0; j < audio.length; j++) {
          buffer2[i + j] = audio[j];
        }
We made a wrong assumption about the mTime field in the SoundData object. It contains estimated time of the data block returned by the decoder. This value can be less than actual time measured in samples [1]. We need to change logic to use playedSamples instead of sampleTime when PlaySilence and PlayFromAudioQueue methods are called. 

The mozCurrentSampleOffset method returns the sample offset that is currently being played by the hardware. Currently, there is no way to get the total amount of written by the mozWriteAudio calls samples.

  [1] http://mxr.mozilla.org/mozilla-central/source/content/media/nsBuiltinDecoderStateMachine.cpp#423
(In reply to comment #3)
> And while we're at it, I'd like add that there is an example in the docs at
> https://wiki.mozilla.org/Audio_Data_API that surely hasn't been actually
> implemented. It's the one that describes the use of mozCurrentSampleOffset. 
> 
> It uses:
>    var frameBuffer = event.mozFrameBuffer;
> which should read
>   var frameBuffer = event.frameBuffer;
> 
> And it uses:
>      buffer.concat(audio);
> to try and concatenate Float32Arrays, which is unfortunately an undefined
> function. 

The wiki examples was modified to remove costly concat() method.
Periodic samples in the media file are encoded with a timestamp, and we use this to seed SoundData::mTime, with intermediate mTime values being offset from this by the number of samples they contain.

As such, mTime it may be rounded down to the nearest ms, so it should be as accurate as rate/1000. Are you seeing mTime being off by more than rate/1000?

I would think that mTime could only be off by more than rate/1000 when the media was incorrectly timestamped. Do you have a testfile which shows this?

I filed bug 598829 yesterday suggesting we should use playedSamples in the PlaySilence() call. sampleTime should be playedSamples otherwise...
(In reply to comment #6)
> As such, mTime it may be rounded down to the nearest ms, so it should be as
> accurate as rate/1000. Are you seeing mTime being off by more than rate/1000?

I wasn't clear, what I mean by this, is that you're fine to use playedSamples as the timestamp for PlayFromAudioQueue(), but if it's off by more than rate/1000 samples, you're seeing a bug somewhere!
(In reply to comment #6)
> Periodic samples in the media file are encoded with a timestamp, and we use
> this to seed SoundData::mTime, with intermediate mTime values being offset from
> this by the number of samples they contain.
> 
> As such, mTime it may be rounded down to the nearest ms, so it should be as
> accurate as rate/1000. Are you seeing mTime being off by more than rate/1000?
> 
> I would think that mTime could only be off by more than rate/1000 when the
> media was incorrectly timestamped. Do you have a testfile which shows this?

That may be correct. I didn't see them being off by more than a few hundred
milliseconds, but that is still substantial, e.g. 0.044 seconds for a 44.1kHz
file. Can we avoid the rounding down?
This rounding down shouldn't cause them to be off by more than 1 ms. Perhaps there's something else going on too?
(In reply to comment #4)
> 
> The mozCurrentSampleOffset method returns the sample offset that is currently
> being played by the hardware. Currently, there is no way to get the total
> amount of written by the mozWriteAudio calls samples.

What I am seeing is that the sample offset that is currently being played by the hardware is a sample number that hasn't been written to the sample buffer yet. That can only be the case if the hardware has seen samples from another source in between samples that I have sent and counted those, too. However, I have not played anything else, so I do wonder where that count comes from.
(In reply to comment #9)
> This rounding down shouldn't cause them to be off by more than 1 ms. Perhaps
> there's something else going on too?

rate/1000 is more than 1 ms though - what am I misunderstanding?
(In reply to comment #5)
> (In reply to comment #3)
> > And while we're at it, I'd like add that there is an example in the docs at
> > https://wiki.mozilla.org/Audio_Data_API that surely hasn't been actually
> > implemented. It's the one that describes the use of mozCurrentSampleOffset. 
> > 
> > It uses:
> >    var frameBuffer = event.mozFrameBuffer;
> > which should read
> >   var frameBuffer = event.frameBuffer;
> > 
> > And it uses:
> >      buffer.concat(audio);
> > to try and concatenate Float32Arrays, which is unfortunately an undefined
> > function. 
> 
> The wiki examples was modified to remove costly concat() method.

That's much more elegant, thanks!
(In reply to comment #2)
> of the same issue. I have played back a file and recorded every framebuffer
> length I have given it and compared that to the reported sample offset and it
> is overtaking that half way through, see http://pastebin.mozilla.org/795142 .

(In reply to comment #10)
> What I am seeing is that the sample offset that is currently being played by
> the hardware is a sample number that hasn't been written to the sample buffer
> yet. That can only be the case if the hardware has seen samples from another
> source in between samples that I have sent and counted those, too. However, I
> have not played anything else, so I do wonder where that count comes from.

I could not replicate the the results posted in the pastebin. I don't think it can be invalid mozSetup parameters. Silvia, could you post the complete source code, the basic information (channel cound and sampling rate) about the audio and the platform information you are using?

Thanks
> I would think that mTime could only be off by more than rate/1000 when the
> media was incorrectly timestamped. Do you have a testfile which shows this?
> 

By applying the following patch to the existing test:

diff -r 54d40057f55a content/media/test/test_a4_tone.html
--- a/content/media/test/test_a4_tone.html Thu Sep 23 20:09:00 2010 -0700
+++ b/content/media/test/test_a4_tone.html Thu Sep 23 22:13:31 2010 -0500
@@ -178,7 +178,7 @@
   spectrumMaxs.push({ value: maxValue, index: maxIndex, time: (currentSampleOffset / testFileSampleRate) });
 
   if( (typeof event.time !== "number") ||
- (Math.abs(event.time - currentSampleOffset / testFileSampleRate) > 0.01) ) {
+ (Math.abs(event.time - currentSampleOffset / testFileSampleRate) >= 0.001) ) {
     isTimePropertyValid = false;
   }
 
The test starts failing: the difference between the event.time and the expected time is more than 1 ms (however it does not exceed 2ms).  By changing the sampleTime to the playedSamples in the PlayFromAudioQueue() call, the difference between the event.time and the expected time becomes less than 1ms. 

This test is using the content/media/test/file_a4_tone.ogg file. There is also the problem with the logic of calculating event.time for the last frame, that keeps test from passing.
(In reply to comment #13)
> (In reply to comment #2)
> > of the same issue. I have played back a file and recorded every framebuffer
> > length I have given it and compared that to the reported sample offset and it
> > is overtaking that half way through, see http://pastebin.mozilla.org/795142 .
> 
> (In reply to comment #10)
> > What I am seeing is that the sample offset that is currently being played by
> > the hardware is a sample number that hasn't been written to the sample buffer
> > yet. That can only be the case if the hardware has seen samples from another
> > source in between samples that I have sent and counted those, too. However, I
> > have not played anything else, so I do wonder where that count comes from.
> 
> I could not replicate the the results posted in the pastebin. I don't think it
> can be invalid mozSetup parameters. Silvia, could you post the complete source
> code, the basic information (channel cound and sampling rate) about the audio
> and the platform information you are using?
> 
> Thanks


Sure, no problem, here it is:

<!DOCTYPE html>
<html lang="en">
  <body>
    <audio src="HelloWorld.ogg" controls></audio>
    <p id="buffersize"></p>
    <script type="text/javascript">
      var input = document.getElementsByTagName("audio")[0];
      var par = document.getElementById("buffersize");
      input.volume = 0;
      var audio = new Audio();
      input.addEventListener("loadedmetadata", getMetadata, false);
      var sampleRate, channels;
      function getMetadata() {
        channels = input.mozChannels;
        sampleRate = input.mozSampleRate;
        audio.mozSetup(channels, sampleRate);
      }
      
      written = 0;
      input.addEventListener("MozAudioAvailable", writeSamples, false);
      function writeSamples (event){
        written += audio.mozWriteAudio(event.frameBuffer);
        offset = audio.mozCurrentSampleOffset();
        par.innerHTML += written + " written<br/>";
        par.innerHTML += offset + " sample position<br/>";
      }
    </script>
  </body>
</html>

I'll also attach both the audio file I use and the web page.
the attached Ogg audio file is my test file for the bugs seen in this bug report
test file of Silvia for the mozCurrentSampleOffset issues
(In reply to comment #15)
> (In reply to comment #13)
> > (In reply to comment #2)
> > > of the same issue. I have played back a file and recorded every framebuffer
> > > length I have given it and compared that to the reported sample offset and it
> > > is overtaking that half way through, see http://pastebin.mozilla.org/795142 .
> > 
> > (In reply to comment #10)
> > > What I am seeing is that the sample offset that is currently being played by
> > > the hardware is a sample number that hasn't been written to the sample buffer
> > > yet. That can only be the case if the hardware has seen samples from another
> > > source in between samples that I have sent and counted those, too. However, I
> > > have not played anything else, so I do wonder where that count comes from.
> > 
> > I could not replicate the the results posted in the pastebin. I don't think it
> > can be invalid mozSetup parameters. Silvia, could you post the complete source
> > code, the basic information (channel cound and sampling rate) about the audio
> > and the platform information you are using?
> > 
> > Thanks
> 
> 
> Sure, no problem, here it is:
> 
> <!DOCTYPE html>
> <html lang="en">
>   <body>
>     <audio src="HelloWorld.ogg" controls></audio>
>     <p id="buffersize"></p>
>     <script type="text/javascript">
>       var input = document.getElementsByTagName("audio")[0];
>       var par = document.getElementById("buffersize");
>       input.volume = 0;
>       var audio = new Audio();
>       input.addEventListener("loadedmetadata", getMetadata, false);
>       var sampleRate, channels;
>       function getMetadata() {
>         channels = input.mozChannels;
>         sampleRate = input.mozSampleRate;
>         audio.mozSetup(channels, sampleRate);
>       }
> 
>       written = 0;
>       input.addEventListener("MozAudioAvailable", writeSamples, false);
>       function writeSamples (event){
>         written += audio.mozWriteAudio(event.frameBuffer);
>         offset = audio.mozCurrentSampleOffset();
>         par.innerHTML += written + " written<br/>";
>         par.innerHTML += offset + " sample position<br/>";
>       }
>     </script>
>   </body>
> </html>
> 
> I'll also attach both the audio file I use and the web page.

2 channels, 44100 Hz sampling rate, on my system I get 2048 big frameBuffers
(In reply to comment #14)
> The test starts failing: the difference between the event.time and the expected
> time is more than 1 ms (however it does not exceed 2ms).  By changing the
> sampleTime to the playedSamples in the PlayFromAudioQueue() call, the
> difference between the event.time and the expected time becomes less than 1ms. 

event.time is sourced from this value [ http://mxr.mozilla.org/mozilla-central/source/content/media/nsAudioAvailableEventManager.cpp#143 ], which is rounded down to the nearest ms and stored in a PRUint64. So you won't have fractions of a ms stored in its value (other than the imprecision inherent in storing values in floating point). All numbers in JS are floats, so when you compare event.time to (currentSampleOffset / testFileSampleRate), that calculation will have float precision, and thus will store fractions of ms. So (currentSampleOffset / testFileSampleRate) will store fractions of ms, but event.time won't. This discrepancy should will account for a little of the error your seeing.

If a fraction of a ms matters to you guys, you could maybe store the event's time value as a sample count right up until right before it gets dispatched on an event to JS, where you convert it to a float?


> This test is using the content/media/test/file_a4_tone.ogg file. There is also
> the problem with the logic of calculating event.time for the last frame, that
> keeps test from passing.

The file looks correct enough, only 1 page which appears to have the advertised amount of audio.

$ OggIndex -o /dev/null -d -p content/media/test/file_a4_tone.ogg
Writing output to 'nul'
[V] page @0 length=58 granulepos=0 end_time=-1ms s=1684669437 packet_starts=1 packet_ends=1 checksum=3683963353
[V] ver=0 Ident packet
[V] page @58 length=3859 granulepos=0 end_time=-1ms s=1684669437 packet_starts=2 packet_ends=2 checksum=1545798439
[V] ver=0 Comment packet
[V] ver=0 Setup packet
[V] page @3917 length=2072 granulepos=132300 end_time=3000ms s=1684669437 packet_starts=146 packet_ends=146 checksum=794002101
[V] sample time_ms=[0,3000] granulepos=[0,132300] eos
Vorbis/1684669437 index uses 62 bytes, compresses to 45 (72.5806%), duration [0,3000] ms
Skeleton 4.0 track with keyframe indexes uses 296 bytes, 4.70963% overhead
The mozCurrentSampleOffset() discussion thread is related to the bug 520100.
Blocks: 598829
Comment on attachment 478605 [details] [diff] [review]
Make mTime attribute as a float, fixes last frame time, and adds fix for gap in the audio stream

Looks good to me. Use C++ style static_cast<float>() instead of C-style (float) casts.
Attachment #478605 - Flags: review?(chris) → review+
blocking2.0: --- → ?
Comment on attachment 478605 [details] [diff] [review]
Make mTime attribute as a float, fixes last frame time, and adds fix for gap in the audio stream

-      time = MILLISECONDS_PER_SECOND * (aEndTimeSampleOffset -
-             mSignalBufferPosition - audioDataLength) / mSamplesPerSecond;
+      time = (aEndTimeSampleOffset - mSignalBufferPosition - audioDataLength) / 
+             static_cast<float>(mSamplesPerSecond);

You should move the cast of mSamplesPerSecond to ::Init() and just do it once, since you always want it as a float now, and we don't use it anywhere else.

Looks good otherwise, thanks for driving this.
Attachment #478605 - Flags: feedback?(david.humphrey) → feedback+
Assignee: david.humphrey → async.processingjs
I don't understand. If the problem is that mTime was not precise enough at 1 ms resolution, making it a 32-bit float means it will have less than 1 ms resolution after about 2 hours 20 minutes, which doesn't seem very long. Is there a reason that double wouldn't be much better?
There is a problem with time doing rounding down three times. The one "flooring" is happening at the OGG decoder's Time() function, converting to sampleTime in AudioLoop(), and the third one was happening at the AudioAvailableEventManager. 

The patch eliminates the last "flooring" operation. I think, to fix the first two we need replace sampleTime to playedSamples in AudioLoop().
This should block because without it the timestamps reported by the audio-available event, which all the fancy audio API demos use, are often inaccurate. It's a low risk change, which doesn't touch any of the decoding logic. Blocking final is fine.
blocking2.0: ? → final+
Keywords: checkin-needed
Attachment #478605 - Attachment is obsolete: true
Whiteboard: [needs landing]
http://hg.mozilla.org/mozilla-central/rev/f0a151e89271
Thanks for the patch!
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [needs landing]
You need to log in before you can comment on or make changes to this bug.