Closed
Bug 1293577
Opened 9 years ago
Closed 9 years ago
Crash in mozilla::image::Decoder::HasProgress
Categories
(Core :: Graphics: ImageLib, defect)
Tracking
()
RESOLVED
FIXED
mozilla51
| Tracking | Status | |
|---|---|---|
| firefox51 | --- | fixed |
People
(Reporter: ting, Assigned: aosmond)
References
Details
(Keywords: crash)
Crash Data
Attachments
(2 files, 1 obsolete file)
|
5.30 KB,
patch
|
aosmond
:
review+
|
Details | Diff | Splinter Review |
|
1.37 KB,
patch
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-1ba88cc4-d70a-4916-aecd-733e12160809.
=============================================================
#42 of 0807 Nightly on Windows, 3 crashes from 3 installations.
| Reporter | ||
Comment 1•9 years ago
|
||
From the minidump, |mDecoder| [1] is null. But there's a null check in the beginning of DecodedSurfaceProvider::Run(), also only FinishDecoding() can set mDecoder null. Is it possible that the task is executed by two DecodePoolWorker, or do you have any other ideas?
[1] https://hg.mozilla.org/mozilla-central/annotate/d42aacfe34af/image/DecodedSurfaceProvider.cpp#l144
Flags: needinfo?(seth.bugzilla)
| Reporter | ||
Updated•9 years ago
|
Crash Signature: [@ mozilla::image::Decoder::HasProgress] → [@ mozilla::image::Decoder::HasProgress]
[@ mozilla::image::DecodedSurfaceProvider::Run]
| Assignee | ||
Comment 3•9 years ago
|
||
I believe you are correct, the same task gets executed by two workers.
In DecodedSurfaceProvider::Run, we call Decoder::Decode:
https://dxr.mozilla.org/mozilla-central/rev/720b5d2c84d5b253d4dfde4897e13384dc97a46a/image/DecodedSurfaceProvider.cpp#133
Decoder::Decode in turn will call DoDecode for a specific decoder which can call into StreamingLexer::Lex. SourceBufferIterator::AdvanceOrScheduleResume is called which can schedule IDecodingTask::Resume to be called later when there is more data.
https://dxr.mozilla.org/mozilla-central/rev/720b5d2c84d5b253d4dfde4897e13384dc97a46a/image/StreamingLexer.h#431
IDecodingTask::Resume itself will put the same DecodedSurfaceProvider object into the decode work queue. In theory all of this can happen before we return from the original Decoder::Decode call. The new DecodedSurfaceProvider::Run call can then finish processing the image and clear our reference.
Comment 4•9 years ago
|
||
Yep, that's right! Thanks for breaking it down, Andrew. Now that you made the mistake of posting in the bug, I'm going to ask you to review the patch. =p
This was an oversight on my part, which is troubling because I wrote all the code involved here and theoretically knew about this issue. It's pretty clear that after all the refactoring that has happened recently, this API (I'm specifically talking about the way IResumable works in conjunction with SourceBufferIterator) doesn't really match our current needs cleanly and has become a bit of a footgun. I have an idea of how to make it better, but it's probably better to wait until the other changes happening lately have died down before tackling that, as it's kinda invasive.
In the short term the simplest fix is just to add a lock, so that's what we'll do. It shouldn't be an issue from a performance point of view, as I expect the lock to be uncontended the vast, vast majority of the time.
Flags: needinfo?(seth.bugzilla)
| Assignee | ||
Comment 5•9 years ago
|
||
Assignee: nobody → aosmond
Status: NEW → ASSIGNED
Attachment #8779508 -
Flags: review?(seth.bugzilla)
Comment 6•9 years ago
|
||
Comment on attachment 8779508 [details] [diff] [review]
add mutex to DecodedSurfaceProvider and AnimationDecodingTask
Review of attachment 8779508 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good! Thanks for tackling this, Andrew!
Just had some style nits:
::: image/DecodedSurfaceProvider.h
@@ +81,5 @@
>
> /// The key under which we're stored as a cache entry in the surface cache.
> SurfaceKey mSurfaceKey;
> +
> + /// Mutex protecting access to the decoder
Nit: please end comments with a period. =)
Non-nit: rather than saying "the decoder", it's much better to say "mDecoder". That way, searching the code for mDecoder will turn this up. Also, please move this next to mDecoder (the usual pattern is right above it), again to make it more noticeable.
::: image/IDecodingTask.h
@@ +83,5 @@
> virtual ~AnimationDecodingTask() { }
>
> NotNull<RefPtr<Decoder>> mDecoder;
> +
> + Mutex mMutex;
Follow the same pattern here. I realize it's obvious in this case since there's only one other member, but it's important to always document which members a mutex is protecting. (Especially since things may change later.)
Attachment #8779508 -
Flags: review?(seth.bugzilla) → review+
| Assignee | ||
Comment 7•9 years ago
|
||
Fix nits.
Attachment #8779508 -
Attachment is obsolete: true
Attachment #8779723 -
Flags: review+
| Assignee | ||
Comment 8•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/64863b1fcf6470de45a6f3979dd3b1709e4b10e8
Bug 1293577 - Protect the image decoding task path with a mutex to avoid race conditions. r=seth
Comment 9•9 years ago
|
||
| bugherder | ||
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Comment 10•9 years ago
|
||
Andrew, I realized after reviewing this that MetadataDecodingTask needs this treatment as well. I believe this might need to be uplifted to aurora (though I'd need to double check) so it'd probably be good to fix them all in one bug. Could you please add a followup to fix this for MetadataDecodingTask as well?
Flags: needinfo?(aosmond)
| Assignee | ||
Comment 11•9 years ago
|
||
Seth, is it just for consistency purposes? I don't see a race in MetadataDecodingTask::Run, because if it reaches a terminal state, it should not have a pending task reference to run, and if it isn't terminal, there is nothing to race for.
Flags: needinfo?(aosmond) → needinfo?(seth.bugzilla)
Comment 12•9 years ago
|
||
(In reply to Andrew Osmond [:aosmond] from comment #11)
> Seth, is it just for consistency purposes? I don't see a race in
> MetadataDecodingTask::Run, because if it reaches a terminal state, it should
> not have a pending task reference to run, and if it isn't terminal, there is
> nothing to race for.
It's not just for consistency purposes. Decoder does no synchronization on its own and its methods aren't reentrant. If we allow reentrant calls to Decoder::Decode(), it will be extremely easy to introduce subtle issues that will potentially be a nightmare to debug. For example, consider what would happen if someone introduces code that reads or writes any Decoder state whatsoever here:
https://dxr.mozilla.org/mozilla-central/source/image/Decoder.cpp?q=Decoder.cpp&redirect_type=direct#133
Or someone adds code to read or write some StreamingLexer state here:
https://dxr.mozilla.org/mozilla-central/source/image/StreamingLexer.h?q=StreamingLexer.h&redirect_type=direct#436
Instant data race.
This is really not something we can allow to lurk as a footgun in the tree.
Flags: needinfo?(seth.bugzilla) → needinfo?(aosmond)
| Assignee | ||
Comment 13•9 years ago
|
||
Exactly the same change to MetadataDecodingTask as to AnimationDecodingTask, as per comment 10 (so I took the liberty of carrying r+ :)).
Flags: needinfo?(aosmond)
Comment 14•9 years ago
|
||
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3eade32e18c4
Part 2. Add mutex to MetadataDecodingTask to protect from decoding races. r=seth
Comment 15•9 years ago
|
||
| bugherder | ||
Comment 16•9 years ago
|
||
Thank you for taking care of that, Andrew. I really appreciate it!
You need to log in
before you can comment on or make changes to this bug.
Description
•