Assertion failure: (next == this) == (prev == this) discarding images in LinkedList

RESOLVED FIXED in Firefox 17

Status

()

Core
ImageLib
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: bc, Assigned: Joe Drew (not getting mail))

Tracking

(Blocks: 1 bug, {assertion, regression, sec-critical})

Trunk
mozilla19
assertion, regression, sec-critical
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite -

Firefox Tracking Flags

(firefox17 fixed, firefox18 fixed, firefox19+ fixed, firefox-esr1017+ fixed)

Details

(Whiteboard: [adv-track-main17+][adv-track-esr17+], URL)

Attachments

(5 attachments)

(Reporter)

Description

5 years ago
1. http://epic.vn/
   Lots of other examples in automation.

2. Assertion failure: (next == this) == (prev == this), at c:\work\mozilla\builds\nightly\mozilla\firefox-debug\dist\include\mozilla/LinkedList.h:165

mozilla::LinkedListElement<mozilla::image::DiscardTracker::Node>::isInList() mozilla::image::RasterImage::DiscardingActive() mozilla::image::RasterImage::UnlockImage() imgRequestProxy::UnlockImage() nsDocument::RemoveImage(imgIRequest*, unsigned int)

Reproduced on Windows and Linux so far.

Also have seen Windows crashes at nsCOMPtr<nsIWeakReference>::~nsCOMPtr<nsIWeakReference>() | mozilla::image::RasterImage::~RasterImage() mozilla::image::RasterImage::`scalar deleting destructor'(unsigned int) + 0xe mozilla::image::RasterImage::Release() nsRefPtr<mozilla::image::Image>::~nsRefPtr<mozilla::image::Image>() imgRequest::~imgRequest()

that are rated low exploitable by breakpad.

There are a number of other stacks associated with this. Marking s-s.

Crashes OSX debug as well.
(Reporter)

Comment 1

5 years ago
the OSX crash was at:

0x000000010391f650 in _moz_cairo_surface_flush (surface=0x6000000013381b29) at /work/mozilla/builds/nightly/mozilla/gfx/cairo/cairo/src/cairo-surface.c:1107
1107	    if (surface->status)
Current language:  auto; currently minimal
(gdb) bt
#0  0x000000010391f650 in _moz_cairo_surface_flush (surface=0x6000000013381b29) at /work/mozilla/builds/nightly/mozilla/gfx/cairo/cairo/src/cairo-surface.c:1107
#1  0x00000001036fbc50 in gfxASurface::Flush (this=0x13401af00) at /work/mozilla/builds/nightly/mozilla/gfx/thebes/gfxASurface.cpp:246
#2  0x00000001015cc68f in imgFrame::Optimize (this=0x13401aa90) at /work/mozilla/builds/nightly/mozilla/image/src/imgFrame.cpp:330
#3  0x00000001015b5472 in mozilla::image::RasterImage::DecodingComplete (this=0x11431aa90) at /work/mozilla/builds/nightly/mozilla/image/src/RasterImage.cpp:1429
#4  0x00000001015ac5bc in mozilla::image::Decoder::Finish (this=0x100c4aa00) at /work/mozilla/builds/nightly/mozilla/image/src/Decoder.cpp:123
#5  0x00000001015af709 in mozilla::image::RasterImage::ShutdownDecoder (this=0x11431aa90, aIntent=mozilla::image::RasterImage::eShutdownIntent_Interrupted) at /work/mozilla/builds/nightly/mozilla/image/src/RasterImage.cpp:2426
#6  0x00000001015af3dd in mozilla::image::RasterImage::~RasterImage (this=0x11431aa90) at /work/mozilla/builds/nightly/mozilla/image/src/RasterImage.cpp:279
#7  0x00000001015af215 in mozilla::image::RasterImage::~RasterImage (this=0x11431aa90) at /work/mozilla/builds/nightly/mozilla/image/src/RasterImage.cpp:252
#8  0x00000001015af1e9 in mozilla::image::RasterImage::~RasterImage (this=0x11431aa90) at /work/mozilla/builds/nightly/mozilla/image/src/RasterImage.cpp:252
#9  0x00000001015aecd9 in mozilla::image::RasterImage::Release (this=0x11431aa90) at /work/mozilla/builds/nightly/mozilla/image/src/RasterImage.cpp:208
I believe this was fixed in bug 801405.
Depends on: 801405
(Reporter)

Comment 3

5 years ago
I'll rebuild and recheck the urls. That would explain the wide variety of crash signatures/stacks I'm seeing.
(Reporter)

Comment 4

5 years ago
Still running the full set of tests but so far I'm seeing 

###!!! ABORT: Invalid frame index!: 'aFrameNum < mFrames.Length()', file c:/work/mozilla/builds/nightly/mozilla/image/src/RasterImage.cpp, line 1292

with new builds.
ok can you maybe reduce this or at least come up with a local testcase?
(Reporter)

Comment 6

5 years ago
epic.vn doesn't reproduce locally but I have other candidates to try once the retest completes.

Comment 7

5 years ago
bug 802144 should be the proper fix here.
(Reporter)

Comment 8

5 years ago
Created attachment 672037 [details]
original.htm

the original.htm lithium testcase where I saved the html page (but not complete). This reproduces on win7 about 90% of the time. This still accesses the network.
(Reporter)

Comment 9

5 years ago
Created attachment 672038 [details]
438-did-round-1.htm

reduced to 34 lines with reproducibility of 30% still with network access. hopefully this helps. let me know if you need a better one and I'll try the other urls as well. PS this was from epic.vn.

Updated

5 years ago
Depends on: 802144
Unfortunately I'm doubtful of my ability to turn these into some kind of automated test. The problem is clear, but it is extraordinarily timing-dependent in ways that can't easily be constructed artificially.
if bug 802144 is the fix then this looks sec-critical, although the timing-dependent nature may prevent a reliable exploit so I'll go with sec-high
Keywords: sec-high
(Reporter)

Comment 12

5 years ago
This is not fixed. I have additional urls which reproduce the original assertion on the linked list. I have a number of urls I can retest. I'll check back in after I retest them.
(Reporter)

Comment 13

5 years ago
Created attachment 672754 [details]
valgrind.log.gz

http://alimama.vn/mua-ban/ban-theo-bo-set/?&p=2

==7234== Invalid write of size 8
==7234==    at 0x197C2A51: gpusLoadTransformFeedbackBuffers (in /System/Library/PrivateFrameworks/GPUSupport.framework/Versions/A/Libraries/libGPUSupport.dylib)

==7234== Invalid write of size 8
==7234==    at 0x8816460: mozilla::LinkedListElement<mozilla::image::DiscardTracker::Node>::setNextUnsafe(mozilla::image::DiscardTracker::Node*) (LinkedList.h:211)

==7234== Invalid read of size 1
==7234==    at 0x881638C: mozilla::LinkedListElement<mozilla::image::DiscardTracker::Node>::asT() (LinkedList.h:189)

==7234== Invalid read of size 8
==7234==    at 0x840F79C: mozilla::TimeStamp::IsNull() const (TimeStamp.h:188)

==7234== Invalid read of size 8
==7234==    at 0x840F6E9: mozilla::TimeStamp::operator-(mozilla::TimeStamp const&) const (TimeStamp.h:202)

==7234== Invalid read of size 8
==7234==    at 0x840F6F7: mozilla::TimeStamp::operator-(mozilla::TimeStamp const&) const (TimeStamp.h:204)

==7234== Invalid read of size 8
==7234==    at 0x88156A4: mozilla::image::DiscardTracker::DiscardNow() (DiscardTracker.cpp:268)

==7234== Invalid read of size 1
==7234==    at 0x881D241: mozilla::image::RasterImage::CanDiscard() (RasterImage.cpp:2311)
(Reporter)

Comment 14

5 years ago
Note I am seeing assertions and crashes related to the above valgrind errors as well as a number of javascript and layout related assertions and crashes on this set of urls. They are all over the map. I'm calling is sec-critical.
Keywords: sec-high → sec-critical
(Assignee)

Comment 15

5 years ago
Bob, can you do a --track-origins valgrind run while we get this into someone's queue?
(Reporter)

Comment 16

5 years ago
the run was done with valgrind --dsymutil=yes --trace-children=yes --track-origins=yes --smc-check=all-non-file
(Assignee)

Comment 17

5 years ago
Ahem, next time I will look at your log before asking you for things you've already done
(Reporter)

Comment 18

5 years ago
(In reply to Bob Clary [:bc:] from comment #13)

> ==7234== Invalid write of size 8
> ==7234==    at 0x197C2A51: gpusLoadTransformFeedbackBuffers (in
> /System/Library/PrivateFrameworks/GPUSupport.framework/Versions/A/Libraries/
> libGPUSupport.dylib)

Who can we report this to?
(Assignee)

Comment 19

5 years ago
Apple, I guess.
Assigning to Joe for his capable stewardship.
Assignee: nobody → joe

Updated

5 years ago
Blocks: 801453

Updated

5 years ago
No longer blocks: 801453
(Assignee)

Comment 21

5 years ago
Got a bit more of a stack by using --num-callers=20

==27880== Invalid write of size 8
==27880==    at 0x698C1C5: mozilla::image::DiscardTracker::Reset(mozilla::image::DiscardTracker::Node*) (LinkedList.h:211)
==27880==    by 0x6990038: mozilla::image::RasterImage::DecodingComplete() (RasterImage.cpp:1552)
==27880==    by 0x698BB04: mozilla::image::Decoder::Finish() (Decoder.cpp:123)
==27880==    by 0x698FCA5: mozilla::image::RasterImage::ShutdownDecoder(mozilla::image::RasterImage::eShutdownIntent) (RasterImage.cpp:2563)
==27880==    by 0x699494C: mozilla::image::RasterImage::~RasterImage() (RasterImage.cpp:404)
==27880==    by 0x6994B1B: mozilla::image::RasterImage::~RasterImage() (RasterImage.cpp:417)
==27880==    by 0x698C5FF: mozilla::image::RasterImage::Release() (in /home/joe/src/mozilla-central/obj-ff-opt/toolkit/library/libxul.so)
==27880==    by 0x69A633D: imgRequest::~imgRequest() (nsAutoPtr.h:874)
==27880==    by 0x69A63B3: imgRequest::~imgRequest() (imgRequest.cpp:107)
==27880==    by 0x69A3A98: imgRequest::Release() (in /home/joe/src/mozilla-central/obj-ff-opt/toolkit/library/libxul.so)
==27880==    by 0x69A8EAF: imgRequestProxy::~imgRequestProxy() (imgRequestProxy.cpp:89)
==27880==    by 0x69A73B7: imgRequestProxy::Release() (in /home/joe/src/mozilla-central/obj-ff-opt/toolkit/library/libxul.so)
==27880==    by 0x6C7EDC6: nsImageLoadingContent::MakePendingRequestCurrent() (nsImageLoadingContent.cpp:972)
==27880==    by 0x6C7F39F: nsImageLoadingContent::LoadImage(nsIURI*, bool, bool, nsIDocument*, unsigned int) (nsImageLoadingContent.cpp:678)
==27880==    by 0x6C7F590: nsImageLoadingContent::LoadImage(nsAString_internal const&, bool, bool) (nsImageLoadingContent.cpp:578)
==27880==    by 0x6D8236F: nsHTMLImageElement::SetAttr(int, nsIAtom*, nsIAtom*, nsAString_internal const&, bool) (nsHTMLImageElement.cpp:378)
==27880==    by 0x6D59695: nsGenericHTMLElement::SetAttribute(nsAString_internal const&, nsAString_internal const&) (nsGenericHTMLElement.cpp:344)
==27880==    by 0x724C28F: nsIDOMElement_SetAttribute(JSContext*, unsigned int, JS::Value*) (dom_quickstubs.cpp:2060)
==27880==    by 0x1DE7F839: ???
==27880==    by 0x7F867E5: js::mjit::EnterMethodJIT(JSContext*, js::StackFrame*, void*, JS::Value*, bool) (MethodJIT.cpp:1043)
==27880==  Address 0x25f1c3b0 is 144 bytes inside a block of size 344 free'd
==27880==    at 0x402B579: free (vg_replace_malloc.c:446)
==27880==    by 0x698C5FF: mozilla::image::RasterImage::Release() (in /home/joe/src/mozilla-central/obj-ff-opt/toolkit/library/libxul.so)
==27880==    by 0x69A633D: imgRequest::~imgRequest() (nsAutoPtr.h:874)
==27880==    by 0x69A63B3: imgRequest::~imgRequest() (imgRequest.cpp:107)
==27880==    by 0x69A3A98: imgRequest::Release() (in /home/joe/src/mozilla-central/obj-ff-opt/toolkit/library/libxul.so)
==27880==    by 0x69A8EAF: imgRequestProxy::~imgRequestProxy() (imgRequestProxy.cpp:89)
==27880==    by 0x69A73B7: imgRequestProxy::Release() (in /home/joe/src/mozilla-central/obj-ff-opt/toolkit/library/libxul.so)
==27880==    by 0x6C7EDC6: nsImageLoadingContent::MakePendingRequestCurrent() (nsImageLoadingContent.cpp:972)
==27880==    by 0x6C7F39F: nsImageLoadingContent::LoadImage(nsIURI*, bool, bool, nsIDocument*, unsigned int) (nsImageLoadingContent.cpp:678)
==27880==    by 0x6C7F590: nsImageLoadingContent::LoadImage(nsAString_internal const&, bool, bool) (nsImageLoadingContent.cpp:578)
==27880==    by 0x6D8236F: nsHTMLImageElement::SetAttr(int, nsIAtom*, nsIAtom*, nsAString_internal const&, bool) (nsHTMLImageElement.cpp:378)
==27880==    by 0x6D59695: nsGenericHTMLElement::SetAttribute(nsAString_internal const&, nsAString_internal const&) (nsGenericHTMLElement.cpp:344)
==27880==    by 0x724C28F: nsIDOMElement_SetAttribute(JSContext*, unsigned int, JS::Value*) (dom_quickstubs.cpp:2060)
==27880==    by 0x1DE7F839: ???
==27880==    by 0x7F867E5: js::mjit::EnterMethodJIT(JSContext*, js::StackFrame*, void*, JS::Value*, bool) (MethodJIT.cpp:1043)
==27880==    by 0x7F86E04: js::mjit::JaegerShot(JSContext*, bool) (MethodJIT.cpp:1101)
==27880==    by 0x7D82815: js::Interpret(JSContext*, js::StackFrame*, js::InterpMode) (jsinterp.cpp:2423)
==27880==    by 0x7D864CE: js::RunScript(JSContext*, JS::Handle<JSScript*>, js::StackFrame*) (jsinterp.cpp:324)
==27880==    by 0x7D8695B: js::InvokeKernel(JSContext*, JS::CallArgs, js::MaybeConstruct) (jsinterp.cpp:379)
==27880==    by 0x7D2DEAE: js_fun_call(JSContext*, unsigned int, JS::Value*) (jsinterp.h:109)
(Reporter)

Comment 22

5 years ago
fyi, i am seeing Assertion failure: (next == this) == (prev == this) on over 100 urls now. This does not include related crashes which may push the url count higher. The number and variety of crash signatures and other fatal assertion messages associated with these urls is quite staggering. In fact the noise from these crashes is overwhelming the signal from other bugs. It is difficult to tell them apart.
(Assignee)

Comment 23

5 years ago
Created attachment 673364 [details] [diff] [review]
don't shut down decoders in ~RasterImage

Shutting down a decoder from the destructor is a little too high-touch, because it's really meant for an image that's going to be sticking around for a while, and we emphatically aren't.

The real problem here is that we removed ourselves from the DiscardTracker, then stopped the decoder correctly, which said "Oh, since we've finished decoding, time to make ourselves discardable" and re-added us to the DiscardTracker. Then our memory was deleted. Good times.
Attachment #673364 - Flags: review?(jmuizelaar)
Comment on attachment 673364 [details] [diff] [review]
don't shut down decoders in ~RasterImage

Review of attachment 673364 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/RasterImage.cpp
@@ +402,5 @@
>    if (mDecoder) {
> +    // Kill off our decode request, if it's pending.  (If not, this call is
> +    // harmless.)
> +    DecodeWorker::Singleton()->StopDecoding(this);
> +    mDecoder = nullptr;

I hope this is safe.
Attachment #673364 - Flags: review?(jmuizelaar) → review+
(Assignee)

Comment 25

5 years ago
Comment on attachment 673364 [details] [diff] [review]
don't shut down decoders in ~RasterImage

[Security approval request comment]
How easily can the security issue be deduced from the patch?

Not easily. I'm not even sure how easily it can be reproduced.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

No.

Which older supported branches are affected by this flaw?

None, probably.

If not all supported branches, which bug introduced the flaw?

Very likely bug 505385.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

N/A

How likely is this patch to cause regressions; how much testing does it need?

It could cause issues in debug builds from old assertions firing, so needs a bunch of testing.
Attachment #673364 - Flags: sec-approval?
(Assignee)

Comment 26

5 years ago
Try run: https://tbpl.mozilla.org/?tree=Try&rev=aae4e0f4cee6
(Assignee)

Comment 27

5 years ago
Comment on attachment 673364 [details] [diff] [review]
don't shut down decoders in ~RasterImage

Josh, can you convince yourself whether this is correct or not?
Attachment #673364 - Flags: review?(josh)
Blocks: 505385
status-firefox-esr10: --- → unaffected
status-firefox18: --- → unaffected
status-firefox19: --- → affected
tracking-firefox19: --- → +
Comment on attachment 673364 [details] [diff] [review]
don't shut down decoders in ~RasterImage

If this isn't affecting older branches we don't have to worry about the patch giving away too much (not that this one does!)

sec-approval+
Attachment #673364 - Flags: sec-approval? → sec-approval+
Flags: in-testsuite?
Comment on attachment 673364 [details] [diff] [review]
don't shut down decoders in ~RasterImage

Review of attachment 673364 [details] [diff] [review]:
-----------------------------------------------------------------

This looks safe to me, given that each image should only be in the request list at most once.
Attachment #673364 - Flags: review?(josh) → review+
(Assignee)

Comment 30

5 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/4b3247d6507b

I was never able to get an automated test to work unfortunately  :(
Flags: in-testsuite? → in-testsuite-
Target Milestone: --- → mozilla19
https://hg.mozilla.org/mozilla-central/rev/4b3247d6507b
Status: NEW → RESOLVED
Last Resolved: 5 years ago
status-firefox19: affected → fixed
Resolution: --- → FIXED
(Assignee)

Updated

5 years ago
Duplicate of this bug: 804041
(Reporter)

Comment 33

5 years ago
fyi, i retested all of the urls where automation had seen this assertion and could not reproduce after the patch landed. Thanks!

Comment 34

5 years ago
Bug 804041 was marked as a dupe of this and is tracking+ for 17 - should this patch land on Aurora and Beta as well? (And if so, should we track this for 17 and remove the dupe from tracking, as it shows up on the list of nonfixed trackers for that release?)
Something needs to land on Aurora + Beta, whether it's the code from bug 804041, bug 803688, or this bug.  I'd guess that the first two options are safer.  The first one is a bit smaller of a change, while the second one points a slightly larger arrow at the attack vector (but it's still not particularly obvious).
(Assignee)

Comment 36

5 years ago
Actually bug 804041 would be a bit more of an arrow at the vector. Either of bug 803688 or this bug would be less obvious. I think, however, that bug 804041 would be a bit better for uplift.

I'll let the security team decide which to push, though.
(Assignee)

Comment 37

5 years ago
Created attachment 675369 [details] [diff] [review]
bug 804041 patch (jlebar)

[Security approval request comment]
How easily can the security issue be deduced from the patch? More easily than the other options; see comment 36.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No.

Which older supported branches are affected by this flaw? All, probably.

How likely is this patch to cause regressions; how much testing does it need? Not terribly likely to regress things.
Attachment #675369 - Flags: sec-approval?
Attachment #675369 - Flags: review+
(Assignee)

Updated

5 years ago
Attachment #675369 - Attachment description: less risky patch for branches → bug 804041 patch (jlebar)
Comment on attachment 675369 [details] [diff] [review]
bug 804041 patch (jlebar)

sec-approval+

This one looks good too, land which ever one fixes the problem(s) with the least risk of regressions on branches.
Attachment #675369 - Flags: sec-approval? → sec-approval+
(Assignee)

Comment 39

5 years ago
Comment on attachment 675369 [details] [diff] [review]
bug 804041 patch (jlebar)

[Approval Request Comment]
Bug caused by (feature/regressing bug #): not regression
User impact if declined: security bug
Testing completed (on m-c, etc.): A morally equivalent but riskier fix is on m-c.
Risk to taking this patch (and alternatives if risky): Less risky than the other alternatives.
String or UUID changes made by this patch: none
Attachment #675369 - Flags: approval-mozilla-beta?
Attachment #675369 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 40

5 years ago
While this assertion doesn't happen on other branches, the underlying bug does.
status-firefox-esr10: unaffected → affected
status-firefox17: --- → affected
status-firefox18: unaffected → affected
Comment on attachment 675369 [details] [diff] [review]
bug 804041 patch (jlebar)

been on central 3 days, approving for branches.
Attachment #675369 - Flags: approval-mozilla-beta?
Attachment #675369 - Flags: approval-mozilla-beta+
Attachment #675369 - Flags: approval-mozilla-aurora?
Attachment #675369 - Flags: approval-mozilla-aurora+
(Assignee)

Updated

5 years ago
Attachment #675369 - Flags: approval-mozilla-esr10?
(Assignee)

Comment 42

5 years ago
https://hg.mozilla.org/releases/mozilla-aurora/rev/67f3adeedc95
https://hg.mozilla.org/releases/mozilla-beta/rev/8bd5b4da0d06
status-firefox17: affected → fixed
status-firefox18: affected → fixed

Updated

5 years ago
Attachment #675369 - Flags: approval-mozilla-esr10? → approval-mozilla-esr10+

Updated

5 years ago
tracking-firefox-esr10: --- → 17+
(Assignee)

Comment 43

5 years ago
https://hg.mozilla.org/releases/mozilla-esr10/rev/821d51ee03bd
status-firefox-esr10: affected → fixed
Whiteboard: [adv-track-main17+][adv-track-esr17+]
Kamil, can you please test to see if this is fixed? You should be able to reproduce this assertion on Windows using the debug Nightly from Oct 21st(1). Once reproduced, please test this is fixed using the latest debug Nightly(2), Aurora(3), Beta(4), and ESR10(5) builds. 

1. ftp://ftp.mozilla.org/pub/firefox/nightly/2012-10-21-mozilla-central-debug/
2. ftp://ftp.mozilla.org/pub/firefox/nightly/2012-11-14-mozilla-central-debug/
3. ftp://ftp.mozilla.org/pub/firefox/nightly/2012-11-14-mozilla-aurora-debug/
4. ftp://ftp.mozilla.org/pub/firefox/nightly/2012-11-14-mozilla-beta-debug/
5. ftp://ftp.mozilla.org/pub/firefox/nightly/2012-11-14-mozilla-esr10-debug/

Please coordinate with my via email or on IRC (ashughes in #qa on irc.mozilla.org) if you run into problems. Thanks.
When going through the following issue, am I looking for actual crashes or just a assertion dialog box? Currently, I am just receiving crashes with both of the central-debug versions linked by Anthony (2012-10-21 & 2012-11-14)

Here's a crash report that occurred on the 2012-11-14 central-debug version:

https://crash-stats.mozilla.com/report/index/bp-e5e2664b-024d-492a-911f-4225c2121116
I'm not sure you will see a crash but you should at least see the assertion. To see the assertion you will need to run debug Firefox from a command line. The assertion should show in the command line output when running the testcase (you won't see anything in the Firefox window).
(Reporter)

Comment 47

5 years ago
I ran the original test case locally over 100 times on Windows 7 32bit esxi vm and a debug Nightly from yesterday and could not reproduce a crash or assertion.

Kamil, in case you are seeing an artifact of previous testing, please reboot, install a fresh debug build and use a fresh profile.
Group: core-security
You need to log in before you can comment on or make changes to this bug.