Closed Bug 944353 Opened 6 years ago Closed 6 years ago

ABORT: Not allowed to make more decoder calls after error!: '!HasDecoderError()'

Categories

(Core :: ImageLib, defect)

x86
Windows 7
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla30
Tracking Status
firefox28 --- wontfix
firefox29 --- verified
firefox30 --- verified
firefox-esr24 --- verified
b2g18 --- unaffected
b2g-v1.1hd --- unaffected
b2g-v1.2 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: cbook, Assigned: tnikkel)

References

(Blocks 1 open bug, )

Details

(Keywords: crash, sec-high, testcase, Whiteboard: found in the wild [adv-main29+][adv-esr24.5+])

Attachments

(4 files)

Attached file windows stack
Found via bughunter on http://www.infovision3.com/
	
Loading this site result on windows 7 debug trunk build (and according to bughunter also beta/aurora) in
	
ABORT: Not allowed to make more decoder calls after error!: '!HasDecoderError()'

Will see that i can generate a testcase
Component: General → ImageLib
We should make the two cases of the "Not allowed to make more decoder calls" debug aborts into runtime checks if this is important.

Bobby: how badly are we screwed if this invariant is violated? Because it apparently happens.
Flags: needinfo?(bobbyholley+bmo)
Whiteboard: found in the wild
(In reply to Daniel Veditz [:dveditz] from comment #1)
> We should make the two cases of the "Not allowed to make more decoder calls"
> debug aborts into runtime checks if this is important.
> 
> Bobby: how badly are we screwed if this invariant is violated? Because it
> apparently happens.

I don't think it's really an issue of us being clearly screwed. It's more that we're just proceeding with potentially-undefined state. Certainly something "interesting" for security researchers, but nothing clear beyond that.

If we can get a reproducible testcase, it should be pretty straightforward to fix.
Flags: needinfo?(bobbyholley+bmo)
Does this still reproduce?  Would it be possible for you to produce a minimized test case?
Flags: needinfo?(cbook)
(In reply to Andrew McCreight [:mccr8] from comment #3)
> Does this still reproduce?  Would it be possible for you to produce a
> minimized test case?

yeah still able on central, looking into a testcase

btw this is the debug output

JPEG decoding error:
Insufficient memory (case 4)
[2240] WARNING: Image decoding error - This is probably a bug!: file c:\Users\mozilla\debug-builds\mozilla-central\image\src\Decoder.cpp, line 416
[2240] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file c:/Users/mozilla/debug-builds/mozilla-central/content/base/src/nsContent
Utils.cpp, line 3063
++DOCSHELL 21E03CA0 == 14 [pid = 2240] [id = 20]
++DOMWINDOW == 27 (1A77B798) [pid = 2240] [serial = 44] [outer = 00000000]
[2240] ###!!! ABORT: Not allowed to make more decoder calls after error!: '!HasDecoderError()', file c:\Users\mozilla\debug-builds\mozilla-central\ima
ge\src\Decoder.cpp, line 95
[2240] WARNING: no buffer: 'mDTBuffer', file c:\Users\mozilla\debug-builds\mozilla-central\gfx\layers\RotatedBuffer.cpp, line 322
Flags: needinfo?(cbook)
(In reply to Andrew McCreight [:mccr8] from comment #3)
> Does this still reproduce?  Would it be possible for you to produce a
> minimized test case?

did a local testcase and then finally found out the reason for this crash is just loading http://www.infovision3.com/s/img/background.jpg?1383915649 (so super minimized testcase :)

will add a local copy of this jpg also. bholley i hope this helps
Attached image testcase
load this jpg result in a crash of firefox trunk debug build on windows
I'm not working on imagelib anymore.
Flags: needinfo?(bobbyholley) → needinfo?(seth)
Thanks for the debug output from comment 4. Based on that it looks like this allocation is failing:
http://mxr.mozilla.org/mozilla-central/source/media/libjpeg/jmemmgr.c#376

When I load the attached testcase the largest value I see being allocated on that line is just over 200mb. I couldn't get the testcase to fail so I just made this line return null if the request is over 200mb. That reproduced the problem.

Carsten, 200mb should not be a problem for your system to allocate. Is there something unusual about your setup? If this is a typical or normal setup you may want to investigate why we can't fulfill that request. I know that we have been seeing cases where the contiguous amount of memory available is low due to fragmentation, maybe you are hitting that problem?
Flags: needinfo?(cbook)
So what happens is a memory allocation during decoding off the main thread fails. A error worker is posted to handle the error and set the error status on the main thread. Before the main thread can process the error worker it handles a OnImageDataAvailable notification resulting in AddSourceData adding the decode request for the image to the pending queue. When we process this new decode request we check the error status only, not if there is a pending error.

Fixing that problem by making decode workers no-opts when there is a pending error just leads to another abort if false that happens during the ShutdownDecoder call from DoError.
(In reply to Timothy Nikkel (:tn) from comment #8)
> Thanks for the debug output from comment 4. Based on that it looks like this
> allocation is failing:
> http://mxr.mozilla.org/mozilla-central/source/media/libjpeg/jmemmgr.c#376
> 
> When I load the attached testcase the largest value I see being allocated on
> that line is just over 200mb. I couldn't get the testcase to fail so I just
> made this line return null if the request is over 200mb. That reproduced the
> problem.
> 
> Carsten, 200mb should not be a problem for your system to allocate. Is there
> something unusual about your setup? If this is a typical or normal setup you
> may want to investigate why we can't fulfill that request. I know that we
> have been seeing cases where the contiguous amount of memory available is
> low due to fragmentation, maybe you are hitting that problem?

hm its a windows 7 system vm with 1 GB assigned Ram, should i try with more ram assigned to the vm ?
Flags: needinfo?(cbook)
(In reply to Timothy Nikkel (:tn) from comment #9)
> Fixing that problem by making decode workers no-opts when there is a pending
> error just leads to another abort if false that happens during the
> ShutdownDecoder call from DoError.

This one seems to be caused by the logic we have in Decoder::Finish to determine if a decode that didn't complete is still usable. Specifically http://hg.mozilla.org/mozilla-central/rev/305129d04cd1 (bug 802390) changed us from deciding we were usable from
  usable = !HasDecoderError() && mImage.GetNumFrames()
to (unrolling the logic)
  usable = (aShutdownIntent == RasterImage::eShutdownIntent_Interrupted) || HasDecoderError() || mImage.GetNumFrames() > 0
So HasDecoderError() being true used to force us to be un-usable, it now forces us to be usable. If HasDecoderError() is true we are going to immediately put our RasterImage into error mode right after this, so we should do the same here and make it not usable. I'll post patches for these two problems shortly.
Blocks: 802390, 716140
Assignee: nobody → tnikkel
Attachment #8373173 - Flags: review?(seth)
Flags: needinfo?(seth)
Comment on attachment 8373173 [details] [diff] [review]
If we have a pending error don't do any more decoding.

Review of attachment 8373173 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good! Sorry for the slow review time on this one.
Attachment #8373173 - Flags: review?(seth) → review+
Comment on attachment 8373174 [details] [diff] [review]
If we've had a decoder error then the image is not usable.

Review of attachment 8373174 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.

::: image/src/Decoder.cpp
@@ +159,1 @@
>      if (aShutdownIntent != RasterImage::eShutdownIntent_NotNeeded && !HasDecoderError()) {

I'd suggest to just replace the |!HasDecoderError()| in the |if| condition with |usable|. Maybe even move it before the |aShutdownIntent| check. I think this makes the intention more clear:

- If |usable| is false, we're definitely not usable.
- If |usable| is true, we might be usable, but do a little more checking (which this |if| block takes care of).
Attachment #8373174 - Flags: review?(seth) → review+
(In reply to Seth Fowler [:seth] from comment #15)
> Comment on attachment 8373174 [details] [diff] [review]
> If we've had a decoder error then the image is not usable.
> 
> Review of attachment 8373174 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good.
> 
> ::: image/src/Decoder.cpp
> @@ +159,1 @@
> >      if (aShutdownIntent != RasterImage::eShutdownIntent_NotNeeded && !HasDecoderError()) {
> 
> I'd suggest to just replace the |!HasDecoderError()| in the |if| condition
> with |usable|. Maybe even move it before the |aShutdownIntent| check. I
> think this makes the intention more clear:
> 
> - If |usable| is false, we're definitely not usable.
> - If |usable| is true, we might be usable, but do a little more checking
> (which this |if| block takes care of).

I think the condition is specific to not having a decoder error rather than just an optimization if the image is already marked unusable. The comment on the next line "If we only have a data error" meaning "we didn't have a decode error, and since we are here we must have had some error, so it must be a data error" indicates this. Of course this doesn't have any effect on the validity of the code, it will be the same either way as the code is currently written.
(In reply to Timothy Nikkel (:tn) from comment #16)
> (In reply to Seth Fowler [:seth] from comment #15)
> > Comment on attachment 8373174 [details] [diff] [review]
> > If we've had a decoder error then the image is not usable.
> > 
> > Review of attachment 8373174 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > Looks good.
> > 
> > ::: image/src/Decoder.cpp
> > @@ +159,1 @@
> > >      if (aShutdownIntent != RasterImage::eShutdownIntent_NotNeeded && !HasDecoderError()) {
> > 
> > I'd suggest to just replace the |!HasDecoderError()| in the |if| condition
> > with |usable|. Maybe even move it before the |aShutdownIntent| check. I
> > think this makes the intention more clear:
> > 
> > - If |usable| is false, we're definitely not usable.
> > - If |usable| is true, we might be usable, but do a little more checking
> > (which this |if| block takes care of).
> 
> I think the condition is specific to not having a decoder error rather than
> just an optimization if the image is already marked unusable. The comment on
> the next line "If we only have a data error" meaning "we didn't have a
> decode error, and since we are here we must have had some error, so it must
> be a data error" indicates this. Of course this doesn't have any effect on
> the validity of the code, it will be the same either way as the code is
> currently written.

I can see that perspective too. Go with your gut. =)
landed on central as https://hg.mozilla.org/mozilla-central/rev/24dcfacab028 and https://hg.mozilla.org/mozilla-central/rev/f71474f7081b
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Shouldn't this have had sec-approval before landing?
Flags: needinfo?(tnikkel)
What's the criteria for that? There were/are no sec flags in the whiteboard or keywords field...
Flags: needinfo?(tnikkel)
Exactly my point.

> Principle: assume the worst
>   If there's no rating we assume it could be critical
>   If we don't know the regression range we assume it needs porting to all supported branches

https://wiki.mozilla.org/Security/Bug_Approval_Process
I'm sorry I wasn't aware of that and I took the failure to add even a "security triage needed" or "sec rating needed" tag to this bug in the 3 months since it was filed, and 3 weeks since it has had a patch as an indication that sec-approval wasn't needed.
(In reply to Timothy Nikkel (:tn) from comment #23)
> I'm sorry I wasn't aware of that and I took the failure to add even a
> "security triage needed" or "sec rating needed" tag to this bug in the 3
> months since it was filed, and 3 weeks since it has had a patch as an
> indication that sec-approval wasn't needed.

We don't have a "sec rating needed" tag.

Can you go back and answer the sec-approval? template? We need to know what versions of Firefox this bug affects since we checked it into trunk and might be exposed all over with it.

Also, we do need a rating here.
[Security approval request comment]
How easily could an exploit be constructed based on the patch?

I'm not sure if this is exploitable or not. Seth do you know more about the Decoder internals to say how badly this can go wrong?

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

By looking at the patch an attacker could deduce there was a problem in decoding images that have a decoder error. They would still need to go from there to an actual image that triggers a decoding error with the right timing.

Which older supported branches are affected by this flaw?
If not all supported branches, which bug introduced the flaw?

The patches fix two regressions, one from bug 716140 on from bug 802390. Bug 716140 indicates it landed on 22, bug 802390 indicates it landed on 19. So basically all branches would be affected.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

Trivial.

How likely is this patch to cause regressions; how much testing does it need?

The code in Decoder.cpp is a little delicate, hopefully we would have found any problems 1.5 months after landing. The RasterImage.cpp patch should be pretty safe.


As for a rating that depends on the answer to the first question.
So it seems that we're waiting on Beta/ESR24 noms here?
Flags: needinfo?(tnikkel)
Comment on attachment 8373173 [details] [diff] [review]
If we have a pending error don't do any more decoding.

approval requests for both patches

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: I can't conclusively say this is sec-crit, but the state of the image decoders is made to violate invariants, once that happens things could get worse.
User impact if declined: internal state of image decoders get messed up
Fix Landed on Version: 30
Risk to taking this patch (and alternatives if risky): this code can be sensitive, but its a very good sign that we've so far detected no regressions and it's been on trunk for a good while now
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 716140 and bug 802390 
User impact if declined: internal state of image decoders get messed up
Testing completed (on m-c, etc.): been on m-c for quite a while now
Risk to taking this patch (and alternatives if risky): this code can be sensitive, but its a very good sign that we've so far detected no regressions and it's been on trunk for a good while now
String or IDL/UUID changes made by this patch: none
Attachment #8373173 - Flags: approval-mozilla-esr24?
Attachment #8373173 - Flags: approval-mozilla-beta?
Flags: needinfo?(tnikkel)
Comment on attachment 8373173 [details] [diff] [review]
If we have a pending error don't do any more decoding.

Accepting it. It is going to be in 29 beta 8 leaving us the opportunity to back out it in case of regressions.
Attachment #8373173 - Flags: approval-mozilla-esr24?
Attachment #8373173 - Flags: approval-mozilla-esr24+
Attachment #8373173 - Flags: approval-mozilla-beta?
Attachment #8373173 - Flags: approval-mozilla-beta+
It seems windows debug builds don't want to start on my machines. I get the MSVCR100D.dll missing error no matter what VC++ version I install.

Tomcat, can you please verify that this is fixed for you on the latest beta?
ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-04-15-mozilla-beta-debug
Flags: needinfo?(cbook)
Keywords: sec-high
Whiteboard: found in the wild → found in the wild [adv-main29+][adv-esr24.5+]
I've verified that this does not crash on latest builds of Fx24.5.0esr, Fx29 and Fx30. 

However, I have not been able to reproduce the original issue on debug builds on or around 2013-11-28, any branch. I've tried Windows 7 as well as Linux and Mac. 

Ideally we'd like to see the problem before it was fixed, but in spite of a fair amount of effort, that's not happening. I am marking verified anyway.
(In reply to Matt Wobensmith from comment #32)
> I've verified that this does not crash on latest builds of Fx24.5.0esr, Fx29
> and Fx30. 
> 
> However, I have not been able to reproduce the original issue on debug
> builds on or around 2013-11-28, any branch. I've tried Windows 7 as well as
> Linux and Mac. 
> 
> Ideally we'd like to see the problem before it was fixed, but in spite of a
> fair amount of effort, that's not happening. I am marking verified anyway.

To see the crash you need malloc's of size 200mb to fail. So try a very memory constrained vm and/or make firefox use lots of the available virtual memory (use Windows 32bit builds due to the small virtual memory limit). I debugged this by manually making the mem request over 200mb fail in the jpeg decoder.
Thanks Timothy, good ideas.

I configured a VM with only 1GB of RAM. I tested by opening five tabs with this image, and then hit "refresh all tabs" a few times. I was able to get a consistent crash with a build of Fx28 from 2013-11-28. Same steps with fixed builds did not crash.
Flags: needinfo?(cbook)
Landed the test:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e3b422965837
Group: core-security
Flags: in-testsuite? → in-testsuite+
You need to log in before you can comment on or make changes to this bug.