Closed Bug 1240995 Opened 4 years ago Closed 4 years ago

Use common function accessor between FFmpeg and FFVPX to call FFmpeg API

Categories

(Core :: Audio/Video: Playback, defect)

45 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: jya, Assigned: jya)

References

Details

Attachments

(2 files)

From bug 1240630 comment 5
--- Comment #5 from Anthony Jones (:kentuckyfriedtakahe, :k17e) <ajones@mozilla.com> 2016-01-19 18:20:33 PST ---
I would prefer to see a struct of function pointers be passed around instead of
the namespaces.

Would be much more elegant
This greatly simplify how the external libavcodec and libavutil are linked.
Attachment #8709879 - Flags: review?(ajones)
Attachment #8709879 - Flags: review?(ajones) → review+
Flags: needinfo?(jyavenard)
njn, mccr8:

so the issue is that ffmpeg on first use initialise a static global mutex (a pthread_mutex_t, which coincidentally happens to be 40 bytes big on linux 64 bits.
(gdb) print sizeof(pthread_mutex_t)
$1 = 40
)

allocation is done here:
https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/utils.c#L77

it's done the first time a codec is initialised.

various bug have been opened at ffmpeg , suggesting a global de-initialisation routine.
The comment is:
"this leak is just a single mutex for the whole program execution and will get freed by the OS after program execution. So the only issue that this causes is a warning in tools like valgrind.
Also its non trivial to free this mutex without race conditions"

njn ran valgrind on the current master, prior any of the changes to the FFmpeg PDM and saw exacly that same leak.
==11709== 40 bytes in 1 blocks are definitely lost in loss record 6,697 of 13,472
==11709==    at 0x4C2E006: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11709==    by 0x4C2E111: posix_memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11709==    by 0x27EED143: ???
==11709==    by 0x4257E6D3: ???
==11709==    by 0x42583CDD: ???
==11709==    by 0x42584003: ???
==11709==    by 0x124315B0: mozilla::FFmpegDataDecoder<46465650>::InitDecoder() (FFmpegDataDecoder.cpp:82)
==11709==    by 0x12432650: mozilla::FFmpegVideoDecoder<46465650>::Init() (FFmpegVideoDecoder.cpp:120)
==11709==    by 0x1235BA50: mozilla::MediaFormatReader::EnsureDecoderInitialized(mozilla::TrackInfo::TrackType) (MediaFormatReader.cpp:426)
==11709==    by 0x1236B928: mozilla::MediaFormatReader::HandleDemuxedSamples(mozilla::TrackInfo::TrackType, mozilla::AbstractMediaDecoder::AutoNotifyDecoded&) (MediaFormatReader.c
pp:884)
==11709==    by 0x1236C6F1: mozilla::MediaFormatReader::Update(mozilla::TrackInfo::TrackType) (MediaFormatReader.cpp:1155)
==11709==    by 0x1236E5C9: apply<mozilla::MediaFormatReader, void (mozilla::MediaFormatReader::*)(mozilla::TrackInfo::TrackType)> (nsThreadUtils.h:676)
==11709==    by 0x1236E5C9: nsRunnableMethodImpl<void (mozilla::MediaFormatReader::*)(mozilla::TrackInfo::TrackType), true, mozilla::TrackInfo::TrackType>::Run() (nsThreadUtils.h:
870)

so why the leak is suddenly revealed with this commit, I have no idea.
But there's nothing we can do about it.

So how do I go about ASAN not reporting that leak?

Thank you.
Flags: needinfo?(n.nethercote)
Flags: needinfo?(jyavenard)
Flags: needinfo?(continuation)
To prove my point, I modified:
            pthread_mutex_t *tmp = av_malloc(sizeof(pthread_mutex_t));
into:
            pthread_mutex_t *tmp = av_malloc(sizeof(pthread_mutex_t) + 28);

https://treeherder.mozilla.org/logviewer.html#?job_id=15783015&repo=try

changed the asan failure from:
SUMMARY: AddressSanitizer: 40 byte(s) leaked in 1 allocation(s).

to:
SUMMARY: AddressSanitizer: 68 byte(s) leaked in 1 allocation(s).

(28 is my birthday, 28 of July that is, always happy with early presents)
Docs on LSAN's suppressions are at the bottom of https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer.

The existing suppressions are in build/sanitizers/lsan_suppressions.txt.

The problem is that the stack traces you get on TreeHerder only contain the single frame, for __interceptor_posix_memalign. If you use that in a suppression then you'll suppress every single leak that ever involves posix_memalign. posix_memalign is fairly rare so this is less of a problem than if it was malloc, but still, it has the potentially to hide future leaks. If ASAN was able to produce with the stack that Valgrind does in comment 7 you could put something more specific like FFmpegDataDecoder<46465650>::InitDecoder in the suppression instead.

I also don't know why this leak would start showing up now when it hasn't previously.

mccr8 knows more about LSAN than I do so he might be able to say more.
Are you certain the mutex creation code is run on TreeHerder machines when your patch isn't applied?
Flags: needinfo?(n.nethercote)
(In reply to Nicholas Nethercote [:njn] from comment #10)
> Are you certain the mutex creation code is run on TreeHerder machines when
> your patch isn't applied?
Yes I am.

This mutex is created on the first call to avcodec_open2 which occurs in FFmpegDataDecoder.cpp:
https://dxr.mozilla.org/mozilla-central/source/dom/media/platforms/ffmpeg/FFmpegDataDecoder.cpp#80
which calls https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/utils.c#L1167

The lock is created on the first call of ff_lock_avcodec(avctx, codec);
https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/utils.c#L1194

I find their whole lock allocation a tad silly as the check to see if the mutex has been allocation isn't thread-safe:
https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/utils.c#L76

Now this is FFmpeg master, it didn't used to do that in the LibAV 0.8.5 found on try machines by default. But this change: http://hg.mozilla.org/mozilla-central/rev/ea9f039b70f14fdd06bb29eddca42c3d5b59bb71

made it use a much more recent version of FFmpeg (same as git master) did land in central today ; yet it didn't trigger the asan failure.

Now that one i would have understood if it got the asan error ; but not this commit.

The first valgrind leak you reported (identical) didn't have my changes right ?
I believe I've found a work around using one of the FFmpeg API...

Will get back to you if that doesn't work
Flags: needinfo?(continuation)
This ensure the memory allocated for their internal mutex is cleared
Attachment #8710974 - Flags: review?(gsquelart)
Comment on attachment 8710974 [details] [diff] [review]
[ffmpeg] P2. Deregister libavcodec global mutex.

Review of attachment 8710974 [details] [diff] [review]:
-----------------------------------------------------------------

r+, with nit:

::: dom/media/platforms/ffmpeg/FFmpegLibWrapper.cpp
@@ +132,5 @@
>  FFmpegLibWrapper::Unlink()
>  {
> +  if (av_lockmgr_register) {
> +    // Clear libavcodec and libavutil global mutexes.
> +    av_lockmgr_register(nullptr);

It'd be nice to give a bit more details about how this call has the effect described in your comment.
E.g.: "Destroy libav* global mutexes, by deregistering the default lockmgr that allocated them".
Attachment #8710974 - Flags: review?(gsquelart) → review+
Depends on: 1241773
(In reply to Nicholas Nethercote [:njn] from comment #9)
> I also don't know why this leak would start showing up now when it hasn't
> previously.

It is possible that the leak shows up because the changes somehow cause the stack being reported to be only a single frame, and before they were suppressed because of something deeper in the stack. It wouldn't be too hard to figure out if that's the case, but I'm not sure it is worth anybody's time.

> mccr8 knows more about LSAN than I do so he might be able to say more.

For future reference, in addition to the suppression list which njn mentioned (and which has he said you could not really use in this case), you can also tell LSan that a particular object should be ignored if it leaks by calling MOZ_LSAN_INTENTIONALLY_LEAK_OBJECT on it.

Thanks for investigating this leak and fixing it, jya. It is good to avoid suppressions when we can, because you always risk the possibility of a suppression masking some real issue that shows up later.
Another possibility for why this started showing up is that maybe it changed it so that the leaked object isn't reachable from a global variable any more in a way that LSan could understand. (LSan doesn't treat those as leaks.) Valgrind may not ignore that kind of leak.
https://hg.mozilla.org/mozilla-central/rev/2081a0b73f80
https://hg.mozilla.org/mozilla-central/rev/67c37b7d8cd2
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Depends on: 1323382
You need to log in before you can comment on or make changes to this bug.