Closed Bug 802168 Opened 12 years ago Closed 12 years ago

Assertion failure: (next == this) == (prev == this) discarding images in LinkedList

Categories

(Core :: Graphics: ImageLib, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla19
Tracking Status
firefox17 --- fixed
firefox18 --- fixed
firefox19 + fixed
firefox-esr10 17+ fixed

People

(Reporter: bc, Assigned: joe)

References

()

Details

(Keywords: assertion, regression, sec-critical, Whiteboard: [adv-track-main17+][adv-track-esr17+])

Attachments

(5 files)

1. http://epic.vn/
   Lots of other examples in automation.

2. Assertion failure: (next == this) == (prev == this), at c:\work\mozilla\builds\nightly\mozilla\firefox-debug\dist\include\mozilla/LinkedList.h:165

mozilla::LinkedListElement<mozilla::image::DiscardTracker::Node>::isInList() mozilla::image::RasterImage::DiscardingActive() mozilla::image::RasterImage::UnlockImage() imgRequestProxy::UnlockImage() nsDocument::RemoveImage(imgIRequest*, unsigned int)

Reproduced on Windows and Linux so far.

Also have seen Windows crashes at nsCOMPtr<nsIWeakReference>::~nsCOMPtr<nsIWeakReference>() | mozilla::image::RasterImage::~RasterImage() mozilla::image::RasterImage::`scalar deleting destructor'(unsigned int) + 0xe mozilla::image::RasterImage::Release() nsRefPtr<mozilla::image::Image>::~nsRefPtr<mozilla::image::Image>() imgRequest::~imgRequest()

that are rated low exploitable by breakpad.

There are a number of other stacks associated with this. Marking s-s.

Crashes OSX debug as well.
the OSX crash was at:

0x000000010391f650 in _moz_cairo_surface_flush (surface=0x6000000013381b29) at /work/mozilla/builds/nightly/mozilla/gfx/cairo/cairo/src/cairo-surface.c:1107
1107	    if (surface->status)
Current language:  auto; currently minimal
(gdb) bt
#0  0x000000010391f650 in _moz_cairo_surface_flush (surface=0x6000000013381b29) at /work/mozilla/builds/nightly/mozilla/gfx/cairo/cairo/src/cairo-surface.c:1107
#1  0x00000001036fbc50 in gfxASurface::Flush (this=0x13401af00) at /work/mozilla/builds/nightly/mozilla/gfx/thebes/gfxASurface.cpp:246
#2  0x00000001015cc68f in imgFrame::Optimize (this=0x13401aa90) at /work/mozilla/builds/nightly/mozilla/image/src/imgFrame.cpp:330
#3  0x00000001015b5472 in mozilla::image::RasterImage::DecodingComplete (this=0x11431aa90) at /work/mozilla/builds/nightly/mozilla/image/src/RasterImage.cpp:1429
#4  0x00000001015ac5bc in mozilla::image::Decoder::Finish (this=0x100c4aa00) at /work/mozilla/builds/nightly/mozilla/image/src/Decoder.cpp:123
#5  0x00000001015af709 in mozilla::image::RasterImage::ShutdownDecoder (this=0x11431aa90, aIntent=mozilla::image::RasterImage::eShutdownIntent_Interrupted) at /work/mozilla/builds/nightly/mozilla/image/src/RasterImage.cpp:2426
#6  0x00000001015af3dd in mozilla::image::RasterImage::~RasterImage (this=0x11431aa90) at /work/mozilla/builds/nightly/mozilla/image/src/RasterImage.cpp:279
#7  0x00000001015af215 in mozilla::image::RasterImage::~RasterImage (this=0x11431aa90) at /work/mozilla/builds/nightly/mozilla/image/src/RasterImage.cpp:252
#8  0x00000001015af1e9 in mozilla::image::RasterImage::~RasterImage (this=0x11431aa90) at /work/mozilla/builds/nightly/mozilla/image/src/RasterImage.cpp:252
#9  0x00000001015aecd9 in mozilla::image::RasterImage::Release (this=0x11431aa90) at /work/mozilla/builds/nightly/mozilla/image/src/RasterImage.cpp:208
I believe this was fixed in bug 801405.
Depends on: 801405
I'll rebuild and recheck the urls. That would explain the wide variety of crash signatures/stacks I'm seeing.
Still running the full set of tests but so far I'm seeing 

###!!! ABORT: Invalid frame index!: 'aFrameNum < mFrames.Length()', file c:/work/mozilla/builds/nightly/mozilla/image/src/RasterImage.cpp, line 1292

with new builds.
ok can you maybe reduce this or at least come up with a local testcase?
epic.vn doesn't reproduce locally but I have other candidates to try once the retest completes.
bug 802144 should be the proper fix here.
Attached file original.htm
the original.htm lithium testcase where I saved the html page (but not complete). This reproduces on win7 about 90% of the time. This still accesses the network.
Attached file 438-did-round-1.htm
reduced to 34 lines with reproducibility of 30% still with network access. hopefully this helps. let me know if you need a better one and I'll try the other urls as well. PS this was from epic.vn.
Depends on: 802144
Unfortunately I'm doubtful of my ability to turn these into some kind of automated test. The problem is clear, but it is extraordinarily timing-dependent in ways that can't easily be constructed artificially.
if bug 802144 is the fix then this looks sec-critical, although the timing-dependent nature may prevent a reliable exploit so I'll go with sec-high
Keywords: sec-high
This is not fixed. I have additional urls which reproduce the original assertion on the linked list. I have a number of urls I can retest. I'll check back in after I retest them.
Attached file valgrind.log.gz
http://alimama.vn/mua-ban/ban-theo-bo-set/?&p=2

==7234== Invalid write of size 8
==7234==    at 0x197C2A51: gpusLoadTransformFeedbackBuffers (in /System/Library/PrivateFrameworks/GPUSupport.framework/Versions/A/Libraries/libGPUSupport.dylib)

==7234== Invalid write of size 8
==7234==    at 0x8816460: mozilla::LinkedListElement<mozilla::image::DiscardTracker::Node>::setNextUnsafe(mozilla::image::DiscardTracker::Node*) (LinkedList.h:211)

==7234== Invalid read of size 1
==7234==    at 0x881638C: mozilla::LinkedListElement<mozilla::image::DiscardTracker::Node>::asT() (LinkedList.h:189)

==7234== Invalid read of size 8
==7234==    at 0x840F79C: mozilla::TimeStamp::IsNull() const (TimeStamp.h:188)

==7234== Invalid read of size 8
==7234==    at 0x840F6E9: mozilla::TimeStamp::operator-(mozilla::TimeStamp const&) const (TimeStamp.h:202)

==7234== Invalid read of size 8
==7234==    at 0x840F6F7: mozilla::TimeStamp::operator-(mozilla::TimeStamp const&) const (TimeStamp.h:204)

==7234== Invalid read of size 8
==7234==    at 0x88156A4: mozilla::image::DiscardTracker::DiscardNow() (DiscardTracker.cpp:268)

==7234== Invalid read of size 1
==7234==    at 0x881D241: mozilla::image::RasterImage::CanDiscard() (RasterImage.cpp:2311)
Note I am seeing assertions and crashes related to the above valgrind errors as well as a number of javascript and layout related assertions and crashes on this set of urls. They are all over the map. I'm calling is sec-critical.
Keywords: sec-highsec-critical
Bob, can you do a --track-origins valgrind run while we get this into someone's queue?
the run was done with valgrind --dsymutil=yes --trace-children=yes --track-origins=yes --smc-check=all-non-file
Ahem, next time I will look at your log before asking you for things you've already done
(In reply to Bob Clary [:bc:] from comment #13)

> ==7234== Invalid write of size 8
> ==7234==    at 0x197C2A51: gpusLoadTransformFeedbackBuffers (in
> /System/Library/PrivateFrameworks/GPUSupport.framework/Versions/A/Libraries/
> libGPUSupport.dylib)

Who can we report this to?
Apple, I guess.
Assigning to Joe for his capable stewardship.
Assignee: nobody → joe
Blocks: 801453
No longer blocks: 801453
Got a bit more of a stack by using --num-callers=20

==27880== Invalid write of size 8
==27880==    at 0x698C1C5: mozilla::image::DiscardTracker::Reset(mozilla::image::DiscardTracker::Node*) (LinkedList.h:211)
==27880==    by 0x6990038: mozilla::image::RasterImage::DecodingComplete() (RasterImage.cpp:1552)
==27880==    by 0x698BB04: mozilla::image::Decoder::Finish() (Decoder.cpp:123)
==27880==    by 0x698FCA5: mozilla::image::RasterImage::ShutdownDecoder(mozilla::image::RasterImage::eShutdownIntent) (RasterImage.cpp:2563)
==27880==    by 0x699494C: mozilla::image::RasterImage::~RasterImage() (RasterImage.cpp:404)
==27880==    by 0x6994B1B: mozilla::image::RasterImage::~RasterImage() (RasterImage.cpp:417)
==27880==    by 0x698C5FF: mozilla::image::RasterImage::Release() (in /home/joe/src/mozilla-central/obj-ff-opt/toolkit/library/libxul.so)
==27880==    by 0x69A633D: imgRequest::~imgRequest() (nsAutoPtr.h:874)
==27880==    by 0x69A63B3: imgRequest::~imgRequest() (imgRequest.cpp:107)
==27880==    by 0x69A3A98: imgRequest::Release() (in /home/joe/src/mozilla-central/obj-ff-opt/toolkit/library/libxul.so)
==27880==    by 0x69A8EAF: imgRequestProxy::~imgRequestProxy() (imgRequestProxy.cpp:89)
==27880==    by 0x69A73B7: imgRequestProxy::Release() (in /home/joe/src/mozilla-central/obj-ff-opt/toolkit/library/libxul.so)
==27880==    by 0x6C7EDC6: nsImageLoadingContent::MakePendingRequestCurrent() (nsImageLoadingContent.cpp:972)
==27880==    by 0x6C7F39F: nsImageLoadingContent::LoadImage(nsIURI*, bool, bool, nsIDocument*, unsigned int) (nsImageLoadingContent.cpp:678)
==27880==    by 0x6C7F590: nsImageLoadingContent::LoadImage(nsAString_internal const&, bool, bool) (nsImageLoadingContent.cpp:578)
==27880==    by 0x6D8236F: nsHTMLImageElement::SetAttr(int, nsIAtom*, nsIAtom*, nsAString_internal const&, bool) (nsHTMLImageElement.cpp:378)
==27880==    by 0x6D59695: nsGenericHTMLElement::SetAttribute(nsAString_internal const&, nsAString_internal const&) (nsGenericHTMLElement.cpp:344)
==27880==    by 0x724C28F: nsIDOMElement_SetAttribute(JSContext*, unsigned int, JS::Value*) (dom_quickstubs.cpp:2060)
==27880==    by 0x1DE7F839: ???
==27880==    by 0x7F867E5: js::mjit::EnterMethodJIT(JSContext*, js::StackFrame*, void*, JS::Value*, bool) (MethodJIT.cpp:1043)
==27880==  Address 0x25f1c3b0 is 144 bytes inside a block of size 344 free'd
==27880==    at 0x402B579: free (vg_replace_malloc.c:446)
==27880==    by 0x698C5FF: mozilla::image::RasterImage::Release() (in /home/joe/src/mozilla-central/obj-ff-opt/toolkit/library/libxul.so)
==27880==    by 0x69A633D: imgRequest::~imgRequest() (nsAutoPtr.h:874)
==27880==    by 0x69A63B3: imgRequest::~imgRequest() (imgRequest.cpp:107)
==27880==    by 0x69A3A98: imgRequest::Release() (in /home/joe/src/mozilla-central/obj-ff-opt/toolkit/library/libxul.so)
==27880==    by 0x69A8EAF: imgRequestProxy::~imgRequestProxy() (imgRequestProxy.cpp:89)
==27880==    by 0x69A73B7: imgRequestProxy::Release() (in /home/joe/src/mozilla-central/obj-ff-opt/toolkit/library/libxul.so)
==27880==    by 0x6C7EDC6: nsImageLoadingContent::MakePendingRequestCurrent() (nsImageLoadingContent.cpp:972)
==27880==    by 0x6C7F39F: nsImageLoadingContent::LoadImage(nsIURI*, bool, bool, nsIDocument*, unsigned int) (nsImageLoadingContent.cpp:678)
==27880==    by 0x6C7F590: nsImageLoadingContent::LoadImage(nsAString_internal const&, bool, bool) (nsImageLoadingContent.cpp:578)
==27880==    by 0x6D8236F: nsHTMLImageElement::SetAttr(int, nsIAtom*, nsIAtom*, nsAString_internal const&, bool) (nsHTMLImageElement.cpp:378)
==27880==    by 0x6D59695: nsGenericHTMLElement::SetAttribute(nsAString_internal const&, nsAString_internal const&) (nsGenericHTMLElement.cpp:344)
==27880==    by 0x724C28F: nsIDOMElement_SetAttribute(JSContext*, unsigned int, JS::Value*) (dom_quickstubs.cpp:2060)
==27880==    by 0x1DE7F839: ???
==27880==    by 0x7F867E5: js::mjit::EnterMethodJIT(JSContext*, js::StackFrame*, void*, JS::Value*, bool) (MethodJIT.cpp:1043)
==27880==    by 0x7F86E04: js::mjit::JaegerShot(JSContext*, bool) (MethodJIT.cpp:1101)
==27880==    by 0x7D82815: js::Interpret(JSContext*, js::StackFrame*, js::InterpMode) (jsinterp.cpp:2423)
==27880==    by 0x7D864CE: js::RunScript(JSContext*, JS::Handle<JSScript*>, js::StackFrame*) (jsinterp.cpp:324)
==27880==    by 0x7D8695B: js::InvokeKernel(JSContext*, JS::CallArgs, js::MaybeConstruct) (jsinterp.cpp:379)
==27880==    by 0x7D2DEAE: js_fun_call(JSContext*, unsigned int, JS::Value*) (jsinterp.h:109)
fyi, i am seeing Assertion failure: (next == this) == (prev == this) on over 100 urls now. This does not include related crashes which may push the url count higher. The number and variety of crash signatures and other fatal assertion messages associated with these urls is quite staggering. In fact the noise from these crashes is overwhelming the signal from other bugs. It is difficult to tell them apart.
Shutting down a decoder from the destructor is a little too high-touch, because it's really meant for an image that's going to be sticking around for a while, and we emphatically aren't.

The real problem here is that we removed ourselves from the DiscardTracker, then stopped the decoder correctly, which said "Oh, since we've finished decoding, time to make ourselves discardable" and re-added us to the DiscardTracker. Then our memory was deleted. Good times.
Attachment #673364 - Flags: review?(jmuizelaar)
Comment on attachment 673364 [details] [diff] [review]
don't shut down decoders in ~RasterImage

Review of attachment 673364 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/RasterImage.cpp
@@ +402,5 @@
>    if (mDecoder) {
> +    // Kill off our decode request, if it's pending.  (If not, this call is
> +    // harmless.)
> +    DecodeWorker::Singleton()->StopDecoding(this);
> +    mDecoder = nullptr;

I hope this is safe.
Attachment #673364 - Flags: review?(jmuizelaar) → review+
Comment on attachment 673364 [details] [diff] [review]
don't shut down decoders in ~RasterImage

[Security approval request comment]
How easily can the security issue be deduced from the patch?

Not easily. I'm not even sure how easily it can be reproduced.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

No.

Which older supported branches are affected by this flaw?

None, probably.

If not all supported branches, which bug introduced the flaw?

Very likely bug 505385.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

N/A

How likely is this patch to cause regressions; how much testing does it need?

It could cause issues in debug builds from old assertions firing, so needs a bunch of testing.
Attachment #673364 - Flags: sec-approval?
Comment on attachment 673364 [details] [diff] [review]
don't shut down decoders in ~RasterImage

Josh, can you convince yourself whether this is correct or not?
Attachment #673364 - Flags: review?(josh)
Comment on attachment 673364 [details] [diff] [review]
don't shut down decoders in ~RasterImage

If this isn't affecting older branches we don't have to worry about the patch giving away too much (not that this one does!)

sec-approval+
Attachment #673364 - Flags: sec-approval? → sec-approval+
Flags: in-testsuite?
Comment on attachment 673364 [details] [diff] [review]
don't shut down decoders in ~RasterImage

Review of attachment 673364 [details] [diff] [review]:
-----------------------------------------------------------------

This looks safe to me, given that each image should only be in the request list at most once.
Attachment #673364 - Flags: review?(josh) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/4b3247d6507b

I was never able to get an automated test to work unfortunately  :(
Flags: in-testsuite? → in-testsuite-
Target Milestone: --- → mozilla19
https://hg.mozilla.org/mozilla-central/rev/4b3247d6507b
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
fyi, i retested all of the urls where automation had seen this assertion and could not reproduce after the patch landed. Thanks!
Bug 804041 was marked as a dupe of this and is tracking+ for 17 - should this patch land on Aurora and Beta as well? (And if so, should we track this for 17 and remove the dupe from tracking, as it shows up on the list of nonfixed trackers for that release?)
Something needs to land on Aurora + Beta, whether it's the code from bug 804041, bug 803688, or this bug.  I'd guess that the first two options are safer.  The first one is a bit smaller of a change, while the second one points a slightly larger arrow at the attack vector (but it's still not particularly obvious).
Actually bug 804041 would be a bit more of an arrow at the vector. Either of bug 803688 or this bug would be less obvious. I think, however, that bug 804041 would be a bit better for uplift.

I'll let the security team decide which to push, though.
[Security approval request comment]
How easily can the security issue be deduced from the patch? More easily than the other options; see comment 36.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No.

Which older supported branches are affected by this flaw? All, probably.

How likely is this patch to cause regressions; how much testing does it need? Not terribly likely to regress things.
Attachment #675369 - Flags: sec-approval?
Attachment #675369 - Flags: review+
Attachment #675369 - Attachment description: less risky patch for branches → bug 804041 patch (jlebar)
Comment on attachment 675369 [details] [diff] [review]
bug 804041 patch (jlebar)

sec-approval+

This one looks good too, land which ever one fixes the problem(s) with the least risk of regressions on branches.
Attachment #675369 - Flags: sec-approval? → sec-approval+
Comment on attachment 675369 [details] [diff] [review]
bug 804041 patch (jlebar)

[Approval Request Comment]
Bug caused by (feature/regressing bug #): not regression
User impact if declined: security bug
Testing completed (on m-c, etc.): A morally equivalent but riskier fix is on m-c.
Risk to taking this patch (and alternatives if risky): Less risky than the other alternatives.
String or UUID changes made by this patch: none
Attachment #675369 - Flags: approval-mozilla-beta?
Attachment #675369 - Flags: approval-mozilla-aurora?
While this assertion doesn't happen on other branches, the underlying bug does.
Comment on attachment 675369 [details] [diff] [review]
bug 804041 patch (jlebar)

been on central 3 days, approving for branches.
Attachment #675369 - Flags: approval-mozilla-beta?
Attachment #675369 - Flags: approval-mozilla-beta+
Attachment #675369 - Flags: approval-mozilla-aurora?
Attachment #675369 - Flags: approval-mozilla-aurora+
Attachment #675369 - Flags: approval-mozilla-esr10?
Attachment #675369 - Flags: approval-mozilla-esr10? → approval-mozilla-esr10+
Whiteboard: [adv-track-main17+][adv-track-esr17+]
Kamil, can you please test to see if this is fixed? You should be able to reproduce this assertion on Windows using the debug Nightly from Oct 21st(1). Once reproduced, please test this is fixed using the latest debug Nightly(2), Aurora(3), Beta(4), and ESR10(5) builds. 

1. ftp://ftp.mozilla.org/pub/firefox/nightly/2012-10-21-mozilla-central-debug/
2. ftp://ftp.mozilla.org/pub/firefox/nightly/2012-11-14-mozilla-central-debug/
3. ftp://ftp.mozilla.org/pub/firefox/nightly/2012-11-14-mozilla-aurora-debug/
4. ftp://ftp.mozilla.org/pub/firefox/nightly/2012-11-14-mozilla-beta-debug/
5. ftp://ftp.mozilla.org/pub/firefox/nightly/2012-11-14-mozilla-esr10-debug/

Please coordinate with my via email or on IRC (ashughes in #qa on irc.mozilla.org) if you run into problems. Thanks.
When going through the following issue, am I looking for actual crashes or just a assertion dialog box? Currently, I am just receiving crashes with both of the central-debug versions linked by Anthony (2012-10-21 & 2012-11-14)

Here's a crash report that occurred on the 2012-11-14 central-debug version:

https://crash-stats.mozilla.com/report/index/bp-e5e2664b-024d-492a-911f-4225c2121116
I'm not sure you will see a crash but you should at least see the assertion. To see the assertion you will need to run debug Firefox from a command line. The assertion should show in the command line output when running the testcase (you won't see anything in the Firefox window).
I ran the original test case locally over 100 times on Windows 7 32bit esxi vm and a debug Nightly from yesterday and could not reproduce a crash or assertion.

Kamil, in case you are seeing an artifact of previous testing, please reboot, install a fresh debug build and use a fresh profile.
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: