Closed Bug 864008 Opened 7 years ago Closed 7 years ago

ASan: conformance/textures/gl-teximage.html fails with "attempting to call malloc_usable_size() for pointer which is not owned"

Categories

(Core :: ImageLib, defect, critical)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: decoder, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: sec-other, testcase, Whiteboard: [asan][asan-test-failure][asan-test-blocker][asan-bug])

Attachments

(2 files)

Attached file ASan log with symbols
I'm seeing the following test failure on tbpl and locally when running the WebGL conformance test suite (full trace is attached) on mozilla-central revision aaa82856c7a9.


==8530== ERROR: AddressSanitizer: attempting to call malloc_usable_size() for pointer which is not owned: 0x600a0000efc0
    #0 0x4159b8 in __interceptor_malloc_usable_size _asan_rtl_
    #1 0x7fe85cfebce4 in gfxImageSurface::SizeOfExcludingThis(unsigned long (*)(void const*)) const gfx/thebes/gfxImageSurface.cpp:183
    #2 0x7fe85cfebd54 in gfxImageSurface::SizeOfIncludingThis(unsigned long (*)(void const*)) const gfx/thebes/gfxImageSurface.cpp:191
    #3 0x7fe85a535c68 in imgFrame::SizeOfExcludingThisWithComputedFallbackIfHeap(gfxASurface::MemoryLocation, unsigned long (*)(void const*)) const image/src/imgFrame.cpp:856
    #4 0x7fe85a5134c9 in mozilla::image::RasterImage::SizeOfDecodedWithComputedFallbackIfHeap(gfxASurface::MemoryLocation, unsigned long (*)(void const*)) const image/src/RasterImage.cpp:1170
    #5 0x7fe85a4fd752 in mozilla::image::ImageResource::SizeOfData() image/src/Image.cpp:41
    #6 0x7fe85a5506ce in imgRequest::UpdateCacheEntrySize() image/src/imgRequest.cpp:341
    #7 0x7fe85a5580f4 in imgRequestProxy::OnStopDecode() image/src/imgRequestProxy.cpp:748
    #8 0x7fe85a55f99a in imgStatusTracker::SyncNotifyState(nsTObserverArray<imgRequestProxy*>&, bool, unsigned int, nsIntRect&, bool) image/src/imgStatusTracker.cpp:512
    #9 0x7fe85a5603ae in imgStatusTracker::SyncNotifyDifference(imgStatusTracker::StatusDiff) image/src/imgStatusTracker.cpp:569
    #10 0x7fe85a516227 in mozilla::image::RasterImage::FinishedSomeDecoding(mozilla::image::RasterImage::eShutdownIntent, mozilla::image::RasterImage::DecodeRequest*) image/src/RasterImage.cpp:3478
    #11 0x7fe85a511966 in mozilla::image::RasterImage::SyncDecode() image/src/RasterImage.cpp:2920
    #12 0x7fe85a51225a in mozilla::image::RasterImage::GetFrame(unsigned int, unsigned int, gfxASurface**) image/src/RasterImage.cpp:1029
[...]

Afterwards, also a CHECK in ASan fails. This could also indicate an ASan bug (in addition to the regular bug here if there is any).

Marking this s-s because calling malloc_usable_size on an unowned pointer could indicate either use-after-free or a memory corruption of some sort.
This is in ImageLib, not in WebGL.
Component: Canvas: WebGL → ImageLib
It seems that this is not an issue with the ASan/Clang version we currently have on TBPL, the TBPL failure I'm seeing looks different. I might have a newer version locally. I'll upgrade to my local version to trunk and ensure it still reproduces. If this is a bug in our code, then we can hopefully get rid of it before we upgrade our Clang on TBPL because this bug seems to trigger very quickly.
needinfo?choller to report back on what happened after upgrading his local version to trunk.
Flags: needinfo?(choller)
Talked to decoder: he is looking into Clang version and will report back after more investigation.
I'm not hitting this with the Clang version we use on try, only with newer Clang. I still think it's possible that it's a bug in our codebase and it's just not popping up with the older Clang due to different compilation/optimization or a bug that has been fixed in ASan.
Flags: needinfo?(choller)
Josh, could you maybe take a look at this?
Flags: needinfo?(josh)
Assignee: nobody → josh
The line numbers in comment 0 are from revision aaa82856c7a9.
Flags: needinfo?(josh)
Duplicate of this bug: 876342
How bad is this, Josh?  Reading some random stuff doesn't sound too bad, but then if we use a bogus size value for deciding whether something is okay to read or not, it could be bad.
The testcase in bug 876342 might be useful in reproducing.
Flags: needinfo?(josh)
Attached file Test from bug 876342
This reproduces really easily and will make a fine crashtest.
Flags: needinfo?(josh)
Ok, the problem is in mozilla::image::RasterImage::SizeOfDecodedWithComputedFallbackIfHeap: the frame we get in the loop is pointing at invalid data, but I haven't figured out why that's the case yet.
However, given that we have a pointer in mFrames and we're presumably within the bounds of the array, this looks like a UAF.
Actually, that guess appears to be false. The actual pointer in the array is pointing at a totally valid image frame, but the stack appears to be being smashed such that the local variable |frame| is overwritten.
Overwritten local variable sounds bad.
I'm having more and more of a hard time convincing myself that this isn't an ASAN mistake. Looking through a full debug, no optimization build in gdb, I see:
* the frame variable is the same as mFrames.ElementAt(0) (the overwriting may have been an artifact of opt builds)
* all of the contents of *frame look fine
* *frame->mImageSurface looks totally sensible
* frame->mImageSurface->mSurface looks totally reasonable, and contains a pointer to the same surface data stored in frame->mImageSurface
* this pointer is the one that ASAN is freaking out about for reasons I can't fathom
Similarly, the underlying pixman image contains a pointer to the same buffer.
Ok, according to jlebar using malloc_usable_size on a block allocated by posix_memalign should be totally fine. That appears to be what ASAN is complaining about here, however; when I force TryAllocAlignedBytes (gfxImageSurface.cpp) to use moz_malloc instead of moz_posix_memalign, ASAN stops triggering on this testcase. I guess this is something we should report upstream.
Assignee: josh → nobody
Can you handle reporting the bug decoder?

Also, it sounds like we can unhide this?
Keywords: sec-other
Whiteboard: [asan][asan-test-failure][asan-test-blocker] → [asan][asan-test-failure][asan-test-blocker][asan-bug]
I've isolated a test for this: If you call posix_memalign with the parameters that are used in the failing test (size 36, alignment 16), and then call malloc_usable_size on that pointer, ASan aborts. I will confirm that this is still the case on tip and report a bug.
Group: core-security
Blocks: 870173
Fixed by upstream on llvm trunk r183647.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Thanks for investigating this, jdm and decoder!
You need to log in before you can comment on or make changes to this bug.