Closed
Bug 896268
Opened 12 years ago
Closed 11 years ago
Assertion failure: NS_IsMainThread() and crash [@mozilla::image::Decoder::AllocateFrame]
Categories
(Core :: Graphics: ImageLib, defect)
Tracking
()
RESOLVED
FIXED
mozilla29
Tracking | Status | |
---|---|---|
firefox27 | --- | wontfix |
firefox28 | + | verified |
firefox29 | --- | fixed |
firefox-esr24 | 28+ | fixed |
b2g18 | --- | unaffected |
b2g-v1.1hd | --- | unaffected |
b2g-v1.2 | --- | fixed |
b2g-v1.3 | --- | fixed |
b2g-v1.3T | --- | fixed |
b2g-v1.4 | --- | unaffected |
People
(Reporter: posidron, Assigned: seth)
References
Details
(6 keywords, Whiteboard: [fuzzblocker][adv-main28+][adv-esr24.4+])
Crash Data
Attachments
(5 files)
2.93 KB,
text/plain
|
Details | |
1.28 KB,
text/html
|
Details | |
70.28 KB,
application/x-zip-compressed
|
Details | |
292 bytes,
text/html
|
Details | |
36.41 KB,
patch
|
jdm
:
review+
abillings
:
approval-mozilla-beta+
abillings
:
approval-mozilla-release+
abillings
:
approval-mozilla-esr24+
|
Details | Diff | Splinter Review |
Load the attached testcase and click the "Ok" button for every two popup boxes, then wait a few seconds.
Assertion failure: NS_IsMainThread(), at /Users/cdiehl/dev/repos/mozilla/mozilla-inbound/image/src/Decoder.cpp:197
Tested with http://hg.mozilla.org/integration/mozilla-inbound/rev/9fd04fd70fc2
Reporter | ||
Comment 1•12 years ago
|
||
Reporter | ||
Comment 2•12 years ago
|
||
Updated•12 years ago
|
Crash Signature: [@ mozilla::image::Decoder::AllocateFrame()]
Comment 3•12 years ago
|
||
If this only happens in ASan+debug builds, it might be related to bug 895845.
Reporter | ||
Comment 4•12 years ago
|
||
This is a very simplified testcase, the stack is the same.
During the fuzzing run I see this crash as:
###!!! ABORT: Calling InitDecoder() but already decoded!: '!mDecoded', file /Users/cdiehl/dev/repos/mozilla/mozilla-inbound/image/src/RasterImage.cpp, line 1963
Tested against http://hg.mozilla.org/integration/mozilla-inbound/rev/f85f06761d52
Updated•11 years ago
|
Whiteboard: [fuzzblocker]
Reporter | ||
Comment 5•11 years ago
|
||
This also blocks WebGL fuzzing with Textures. Any chance we can get this bug some care?
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(milan)
Updated•11 years ago
|
Flags: needinfo?(milan)
Assignee | ||
Comment 8•11 years ago
|
||
(In reply to Christoph Diehl [:cdiehl] from comment #5)
> This also blocks WebGL fuzzing with Textures. Any chance we can get this bug
> some care?
I've had the testcase running for quite some time without generating a crash.
This is something I should be able to fix without actually reproducing it, just by doing a conservative check in Decoder::Write that we're on the main thread before attempting to allocate frames.
However, I'd like to be able to reproduce this so I can get a better sense of what the sequence of events was leading up to this point. I suspect this is a case where we're calling SetSynchronous(true) on the decoder and either failing to call SetSynchronous(false) or wrongly doing this outside of the decoding lock.
Right now my suspicions fall right here:
https://mxr.mozilla.org/mozilla-central/source/image/src/RasterImage.cpp#2409
This looks obviously unsafe to me but it's hard to be sure this is actually what's causing _this_ issue. I'll write a patch for this regardless.
Assignee | ||
Comment 9•11 years ago
|
||
Heh, sorry, I never got around to my question. Christoph, can you reproduce this outside of the fuzzer or ASAN, just by viewing the testcase on a debug build?
Flags: needinfo?(cdiehl)
Reporter | ||
Comment 10•11 years ago
|
||
(In reply to Seth Fowler [:seth] from comment #8)
> I've had the testcase running for quite some time without generating a crash.
Try hitting cmd+r a couple of times.
(In reply to Seth Fowler [:seth] from comment #9)
> Heh, sorry, I never got around to my question. Christoph, can you reproduce
> this outside of the fuzzer or ASAN, just by viewing the testcase on a debug
> build?
Outside of the fuzzer - yes, outside of ASan - not sure haven't tested that.
Flags: needinfo?(cdiehl)
Assignee | ||
Comment 11•11 years ago
|
||
OK, I tried reloading a good deal and still didn't have luck. I'll try this on an ASan build tomorrow; I don't have access to a linux machine today.
Assignee | ||
Comment 12•11 years ago
|
||
Bug 940714, which just hit m-c, took care of one possible cause. I still haven't gotten in front of a linux machine to test an ASan build, but I'll try to do so soon.
Assignee | ||
Comment 13•11 years ago
|
||
I have now diagnosed the problem. It boils down to one line, RasterImage.cpp:3032:
> MutexAutoUnlock(mDecodingMutex);
When we unlock the mutex to send out notifications, all of our invariants are lost. Since we have semi-global variables in imagelib (for example, boolean determining whether a decoder is synchronous), things can go haywire when this happens.
I've pushed a try job with a simple fix for this problem, with the intention of generating an ASan build to verify a fix. The better solution is to eliminate 'mSynchronous' in the Decoder and simply pass it in. I'm going to try that now, but the structure of the code may make that difficult.
We are probably going to keep having problems until we get rid of this call to MutexAutoUnlock, though. The right thing is probably just to send those notifications asynchronously and take the perf hit. =(
https://tbpl.mozilla.org/?tree=Try&rev=31a44ff30d09
Assignee | ||
Comment 14•11 years ago
|
||
This is a test of the patch to remove 'mSynchronous' from imagelib's Decoder class altogether. Synchronous decoding is instead determined statelessly by function arguments. Inter-thread interference thus can't happen.
This is one I'd be more interested in landing if the tests look green.
https://tbpl.mozilla.org/?tree=Try&rev=eeb99256c0c1
Assignee | ||
Comment 15•11 years ago
|
||
Nice! I'm pleased that removing mSynchronous turned out so well. I'm going to use try to generate an ASan build based upon the above patch and verify the fix.
https://tbpl.mozilla.org/?tree=Try&rev=a5e2d5ea01cf
Assignee | ||
Comment 16•11 years ago
|
||
I can no longer reproduce the problem with the build above. Will post a review-ready version of the patch shortly.
Assignee | ||
Comment 17•11 years ago
|
||
This patches removes the 'mSynchronous' variable from the Decoder class entirely, and all of the supporting machinery. Instead, callers to Decode::Write pass in a DecodeStrategy, which is either DECODE_SYNC or DECODE_ASYNC. This provides the same functionality, but it prevents different threads from interfering with each other.
This fixes the specific issue discussed in this bug, and IMO it makes the code cleaner. Work in other bugs will fix the more general issue described in comment #13.
Attachment #8346961 -
Flags: review?(josh)
Group: core-security
Assignee | ||
Comment 18•11 years ago
|
||
Now that this bug is secure, I'll state more clearly that bug 943804 is where the general issue in comment #13 will get fixed. This bug should've been hidden earlier, but I didn't immediately realize the connection between it and some of the other issues in imagelib lately.
Assignee | ||
Comment 19•11 years ago
|
||
Sorry, bug 943803.
Comment 20•11 years ago
|
||
Comment on attachment 8346961 [details] [diff] [review]
Use a stateless approach to synchronous image decoding.
Nice.
Attachment #8346961 -
Flags: review?(josh) → review+
Assignee | ||
Comment 21•11 years ago
|
||
Thanks for the review!
Pushed:
https://hg.mozilla.org/integration/mozilla-inbound/rev/772ce6463fd8
Status: NEW → RESOLVED
Closed: 11 years ago
status-firefox28:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Comment 23•11 years ago
|
||
Was this trunk only? Otherwise, it should have got a security rating before going in and we may need to take it on branches.
Updated•11 years ago
|
Flags: needinfo?(seth)
Updated•11 years ago
|
status-b2g18:
--- → ?
status-b2g-v1.2:
--- → ?
status-b2g-v1.3:
--- → fixed
status-b2g-v1.3T:
--- → fixed
status-b2g-v1.4:
--- → unaffected
status-firefox-esr24:
--- → ?
Assignee | ||
Comment 24•11 years ago
|
||
(In reply to Al Billings [:abillings] from comment #23)
> Was this trunk only? Otherwise, it should have got a security rating before
> going in and we may need to take it on branches.
Apologies for the slow response! Finally getting my backlog cleared out.
Given that it was reported 7 months ago, I'd say it's not trunk-only. =) This is actually fallout from bug 716140, meaning AFAICT it goes all the way back to FF *22*.
Presumably that means the patch should go at least to ESR 24.
Flags: needinfo?(seth)
Comment 25•11 years ago
|
||
It can't get approved for ESR24 without a rating though (and probably not branches).
It doesn't seem that the crash is exploitable, when it can be triggered. What are the security implications here?
Comment 26•11 years ago
|
||
(Never mind about branches, I didn't see when we'd fixed it.)
status-b2g-v1.1hd:
--- → affected
status-firefox27:
--- → wontfix
Comment 27•11 years ago
|
||
What's the sec rating here? Also, can we get an esr24 nom please? :)
Updated•11 years ago
|
Flags: needinfo?(seth)
Comment 28•11 years ago
|
||
Looks like this isn't going to be going to ESR24.
Whiteboard: [fuzzblocker] → [fuzzblocker][adv-main28+]
Assignee | ||
Comment 29•11 years ago
|
||
(In reply to Al Billings [:abillings] from comment #25)
> It can't get approved for ESR24 without a rating though (and probably not
> branches).
>
> It doesn't seem that the crash is exploitable, when it can be triggered.
> What are the security implications here?
I don't *think* this is exploitable. Bug 943803 fixed the more fundamental issue.
Flags: needinfo?(seth)
Assignee | ||
Comment 30•11 years ago
|
||
I guess what I just posted does not constitute a sec rating, though. Whenever I'm asked to assign a sec rating, I feel like I'm basically choosing arbitrarily. I don't really understand the nuances that exist between the various ratings other than sec-critical (and this isn't sec-critical, AFAICT).
Comment 31•11 years ago
|
||
Does this help?
https://wiki.mozilla.org/Security_Severity_Ratings
Assignee | ||
Comment 32•11 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #31)
> Does this help?
> https://wiki.mozilla.org/Security_Severity_Ratings
That does help, but a bit of frustration remains for me because if I am not sure of the exact ramifications of a bug (as is frequently the case, because once you lose your invariants it's very difficult to reason about what else can happen) then the only reasonable rating I can give is sec-critical. By these definitions, very few security bugs I've dealt with could be rated any other way.
So I suppose sec-critical is my rating for this bug as well. In the future I will treat that as my default rating unless I am *extremely* convinced that a lower rating is warranted.
Comment 33•11 years ago
|
||
That's a valid conservative rating strategy. It is usually easier to fix the bug than "prove" it is or isn't exploitable in practice, and unless you're convinced a bug is not exploitable it's better to fix it than have someone else prove us wrong.
> Whenever I'm asked to assign a sec rating, I feel like I'm basically choosing arbitrarily.
We all feel that way. Our use of the word "triage" with its war-time connotations is not accidental. Take your best guess and we hope we're right more often than not. Don't spend too much time thinking about the rating because your time is more valuable spent on fixing.
Comment 34•11 years ago
|
||
If this is sec-crit, we're going to want it on esr24. Please nominate it as such :). Up to the release drivers if it's landed for this cycle or the next at this point.
Comment 35•11 years ago
|
||
It is too late for this cycle.
Comment 36•11 years ago
|
||
BTW, this never landed on 28.
Comment 37•11 years ago
|
||
Al, if we're going to be landing Pwn2Own fixes for 28/24.4 still, should we take this while we're at it since we're treating it as a sec-crit?
Flags: needinfo?(abillings)
Comment 38•11 years ago
|
||
This applies with a little bit of care to 28, but <28 will need a more targeted patch.
Flags: needinfo?(seth)
Keywords: branch-patch-needed
Comment 39•11 years ago
|
||
The beta Try run was green.
Comment 40•11 years ago
|
||
Yes, please get this in, especially since it was supposed to go in. At least for 28.
Flags: needinfo?(abillings)
Comment 41•11 years ago
|
||
If it is critical, I would be happy to have it for ESR 24.
Comment 42•11 years ago
|
||
It missed the boat for this release since it is in a few days.
Keywords: sec-critical
Comment 43•11 years ago
|
||
Let's get this sec-critical fix landed on mozilla-release and esr24 branch so that it can ride along with the pwn2own landings happening today for the 28 rc respin.
Comment 44•11 years ago
|
||
Comment on attachment 8346961 [details] [diff] [review]
Use a stateless approach to synchronous image decoding.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 716140
User impact if declined: sec-crit
Testing completed (on m-c, etc.): on trunk for 3 months
Risk to taking this patch (and alternatives if risky): passes on try
String or IDL/UUID changes made by this patch: none
Attachment #8346961 -
Flags: approval-mozilla-beta?
Comment 45•11 years ago
|
||
This applies much better to <28 with bug 940714 also applied. I've got Try runs going for that right now.
Updated•11 years ago
|
Blocks: 716140
Keywords: regression
Assignee | ||
Comment 46•11 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #45)
> This applies much better to <28 with bug 940714 also applied. I've got Try
> runs going for that right now.
I'll request uplift for that too. It's kinda a strange situation because bug 940714 is pretty much completely obliterated by the changes in this bug, so we're mainly uplifting it to make this patch easier to apply, but I think it makes sense to do that.
Flags: needinfo?(seth)
Assignee | ||
Comment 47•11 years ago
|
||
Comment on attachment 8346961 [details] [diff] [review]
Use a stateless approach to synchronous image decoding.
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: It's sec-crit.
User impact if declined: Crashes which may be exploitable.
Fix Landed on Version: 29
Risk to taking this patch (and alternatives if risky): Very low. On m-c for 3 months with no issues. Affects a section of the code which hasn't churned much since the ESR release, so surprising interactions are unlikely.
String or UUID changes made by this patch: None.
Attachment #8346961 -
Flags: approval-mozilla-release?
Attachment #8346961 -
Flags: approval-mozilla-esr24?
Comment 48•11 years ago
|
||
Comment on attachment 8346961 [details] [diff] [review]
Use a stateless approach to synchronous image decoding.
Approving per lsblakk comments.
Attachment #8346961 -
Flags: approval-mozilla-release?
Attachment #8346961 -
Flags: approval-mozilla-release+
Attachment #8346961 -
Flags: approval-mozilla-esr24?
Attachment #8346961 -
Flags: approval-mozilla-esr24+
Attachment #8346961 -
Flags: approval-mozilla-beta?
Attachment #8346961 -
Flags: approval-mozilla-beta+
Comment 49•11 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/4dd51f45f0bd
https://hg.mozilla.org/releases/mozilla-release/rev/4dd51f45f0bd
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/4dd51f45f0bd
https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/6ca3ec4cee1d
https://hg.mozilla.org/releases/mozilla-esr24/rev/a558f298d0f4
Comment 50•11 years ago
|
||
I tried to reproduce this issue on an ASAN build from a couple of days ago but I don't get any assertion failures or crashes. I don't get any "ok" buttons either, pop-up dialogs just keep appearing and disappearing very fast, but I don't see any buttons on them (the text is something like this: "Generating a private key. Please wait...").
I am testing on Ubuntu 12.10 64bit. Is this Mac OS X specific? (if it is, please direct me to some builds to reproduce/verify this fix with).
Updated•11 years ago
|
Updated•11 years ago
|
Flags: needinfo?(seth)
Comment 51•11 years ago
|
||
(In reply to Ioana Budnar, QA [:ioana] from comment #50)
> I tried to reproduce this issue on an ASAN build from a couple of days ago
This is actually "a couple of months ago". To be more specific, I tested with the Firefox 29.0a1 ASAN build from 12/10.
I tested this again today with an RC ASAN from 03/15. The difference compared to the pre-fix build when using the attached testcase is that here I get a dialog for setting a password for the Software Security Device. The pop-up dialog that asks for the password is then re-displayed after each "ok" or "cancel". The simplified testcase freezes the browser. It doesn't get to any crashes or other issues, it just stays frozen until the user closes it (it can be closed from the [x] button, you don't need to kill the process).
Comment 52•11 years ago
|
||
Matt, can you please help in verifying this is fixed ahead of release tomorrow?
Flags: needinfo?(mwobensmith)
Comment 53•11 years ago
|
||
It takes a minute or two of running with these test cases to see a crash. I wasn't able to do it with a non-ASan build - these are local ASan builds.
Confirmed crash in 24.3.0esrpre, 2014-03-12.
Verified fixed in Fx28, 2014-03-17.
Still testing other branches.
Flags: needinfo?(mwobensmith)
Updated•11 years ago
|
Whiteboard: [fuzzblocker][adv-main28+] → [fuzzblocker][adv-main28+][adv-esr24.4+]
Assignee | ||
Updated•11 years ago
|
Flags: needinfo?(seth)
Updated•11 years ago
|
QA Contact: mwobensmith
Updated•10 years ago
|
Group: core-security
Comment 54•9 years ago
|
||
This issue appears to be an issue Qanalyst is unable to verify. Marking QAExclude in QA Whiteboard.
QA Whiteboard: QAExclude
Flags: needinfo?(jmercado)
Updated•9 years ago
|
Flags: needinfo?(jmercado)
You need to log in
before you can comment on or make changes to this bug.
Description
•