Closed Bug 1236359 Opened 8 years ago Closed 8 years ago

OSX 10.6 memory corruption crashes with CGImageCreate: invalid image bits/pixel or bytes/row

Categories

(Core :: Graphics, defect)

46 Branch
x86_64
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla46
Tracking Status
firefox43 --- unaffected
firefox44 --- unaffected
firefox45 + fixed
firefox46 + fixed
firefox-esr38 --- unaffected
firefox-esr45 --- fixed

People

(Reporter: bc, Assigned: mstange)

References

()

Details

(Keywords: crash, regression, sec-critical, Whiteboard: [fixed-by-bug-1241665])

Crash Data

This bug was filed from the Socorro interface and is 
report bp-9ffb4333-5083-4822-a591-8e2522160103.
=============================================================

This crash appears to be OSX 10.6 only.

1. http://2015.strava.com/video/SFsBZCR
2. Reload until you crash.
   I use Developer Web Console to open a tab and use Developer Web Console in that tab to perform setInterval('opener.document.location.reload()', 60000) to automate reloading.
3. Crash with multiple signatures.

In addition to the crash above I hit bp-9e15a80e-739f-48f1-9b38-0e6302160103 which is another crash in js::GCMarker::lazilyMarkChildren in addition to bp-bc003fed-21d4-4d7b-8290-b136d2160103  [@ jemalloc_crash | arena_malloc | je_realloc | moz_xrealloc | nsTArrayInfallibleAllocator::ResultTypeProxy nsTArray_base<T>::EnsureCapacity<T> ]

s-s since this is GC related and since I hit the gc related crash 2 out of 3 times I crashed.

If you don't have access to OSX 10.6 I have 3 machines in SCL3 but you may need to request vpn access to bughunter-osx-00{1..3}.ateam.scl3.mozilla.com.

It does appear to be a regression. I'll try to narrow it down.
I did a manual regression search using mozregression and it points to 

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=fbe047a4fbf493a602d30ff0fb903b59ab384d09&tochange=c0b3b26d05cfdcb53d9877a6f9388f7d3d2e20d6

Bug 1187322 - Don't require accelerated OpenGL contexts for BasicCompositor on OS X whose original bug summary is
Bug 1187322 - [e10s] HTML document rendering doesn't work with multi-process on OSX 10.6.8 in VirtualBox

So, 10.6 does make sense in this context. Note this is on an actual physical machine. running 10.6.

Note that I've seen a recent explosion in crashes on 10.6 which may be related to this.
Blocks: 1187322
Note the original url displays a number of CGImageCreate: invalid image bits/pixel or bytes/row errors.

While the crash is not that reliable pasting the url into a running instance of Firefox, starting from the command line also shows the CGImageCreate errors and a segmentation fault.

/Applications/FirefoxNightly.app/Contents/MacOS/firefox -P default 'http://2015.strava.com/video/SFsBZCR'

I used this to load the url 10 times per test and automated mozregression to find the same regression range. I saw messages related 'pointer being freed was not allocated' and believe this is not GC related at all but is instead evidence of memory corruption.

I have other crashes for 10.6 which I will attempt to bisect and if they point back to the same bug, then I believe the memory corruption will be confirmed.
Component: JavaScript: GC → Graphics
bp-bc37bce5-9eba-4de8-8217-edbae2160103
[@ js::gc::GCRuntime::markCompartments ]

bp-359efcb1-6a32-425d-b9d6-0dd6d2160103
[@ js::jit::FinishOffThreadBuilder ]

both have the same NSFW url. You can get the url from the crash reports if you have permission to do so. These crashes have the same regression range as before.

In my opinion, Bug 1187322 should be backed out.
Crash Signature: [@ js::GCMarker::lazilyMarkChildren] [@ jemalloc_crash | arena_malloc | je_realloc | moz_xrealloc | nsTArrayInfallibleAllocator::ResultTypeProxy nsTArray_base<T>::EnsureCapacity<T> ] → [@ js::GCMarker::lazilyMarkChildren] [@ jemalloc_crash | arena_malloc | je_realloc | moz_xrealloc | nsTArrayInfallibleAllocator::ResultTypeProxy nsTArray_base<T>::EnsureCapacity<T> ] [@ js::gc::GCRuntime::markCompartments ] [@ js::jit::FinishOffThreadB…
Summary: OSX 10.6 crash in js::GCMarker::lazilyMarkChildren → OSX 10.6 memory corruption crashes with CGImageCreate: invalid image bits/pixel or bytes/row
Flags: needinfo?(mstange)
Keywords: sec-critical
I'm requesting tracking for 45 as well because bug 1187322 is being requested for uplift to Aurora.
sec-critical, new crash, tracking it.
Group: core-security → gfx-core-security
This could be caused by our use of Apple's software OpenGL renderer, or by our own code in BasicCompositor.

Bob, can you reproduce this on a more recent version of OS X if you disable hardware acceleration?

I'd expect the CGImageCreate errors to come from bugs in our own code. Unfortunately I don't see them on my machine.
I have a 10.6 machine somewhere, I'll go look for it tomorrow.

I found two bugs where we were using uninitialized memory in BasicCompositor (bug 1238753), but I didn't find any obvious memory corruption bugs.
Flags: needinfo?(mstange)
bp-a045594b-187d-4a8e-8483-0d5892160112
OS X 10.9 layers.acceleration.disabled true
[@ mozilla::layers::CloneLayerTreePropertiesInternal ]

debug
AVF info: Successfully connected to the Intel plugin, offline Gen6 
Jan 12 10:42:45 bughunter-osx-008.ateam.scl3.mozilla.com firefox-bin[94931] <Error>: CGImageCreate: invalid image bits/pixel or bytes/row.
... repeated
/mozilla/bin/firefox-runner.sh: line 189: 94931 Segmentation fault: 11  "$executable" -P $profile $@

I need to refresh my install of valgrind and get do a new local build. More news as I get it.
I just landed a patch in bug 1238755 that makes us use Skia instead of CoreGraphics for BasicCompositor. That might get rid of the CGImageCreate warnings, or replace them with other warnings.
more release crashes. some of these have scribble set, but I'm not sure which. :-( sorry.

bp-4a045594b-187d-4a8e-8483-0d5892160112
[@ mozilla::layers::CloneLayerTreePropertiesInternal ]

bp-24c74396-4a8e-4a4f-9c70-202d02160112
[@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER ]

bp-e9861114-5f47-4dfc-8cca-ea6822160112
[@ jemalloc_crash | arena_bin_malloc_easy | arena_malloc | je_malloc | moz_xmalloc | nsRunnableMethodTraits<T>::base_type* NS_NewRunnableMethodWithArg<T> ]

bp-b5df3922-e02a-4eaf-9531-a52b72160112
[@ mozilla::gfx::DrawTargetCG::CopySurface ]

bp-53fdcba9-d63d-4c8b-a2d9-27d842160112
[@ jemalloc_crash | arena_bin_malloc_easy | arena_malloc | je_malloc | malloc_zone_malloc | malloc | libsystem_c.dylib@0x44616 ]

The variety of crashes is troublesome. I'll get valgrind report to you soon.

I'll retest the url as soon as your patches lands on m-c.
I was able to reproduce the CGImageCreate warnings on a page with 3d transforms and filed bug 1239137 about it. I don't expect that to fix any memory corruption, though.
Unfortunately, it appears that valgrind tip on OS X 10.9 is stuck and I won't be able to provide more feedback. :jseward, Can you provide any valgrind results on OSX 10.8 or later?
Flags: needinfo?(jseward)
Bob, is this still happening for you?
Flags: needinfo?(bob)
yep, 

bp-30593def-8a06-4459-a7f3-578ea2160120
build 20160103030302
[@ jemalloc_crash | arena_malloc | je_realloc | moz_xrealloc | nsTArrayInfallibleAllocator::ResultTypeProxy nsTArray_base<T>::EnsureCapacity<T> ]

bp-0389aeb8-ea23-4a46-b0b4-827402160120
build 20160120030239
[@ JSCompartment::findOutgoingEdges ]

bughunter shows a much rate of crashing from 10.6. This should have been fixed or backed out long before now.
Flags: needinfo?(bob)
ditto 10.9
layers.acceleration.disabled true
bp-088e118d-5953-44a3-9b41-7df272160120

about:support Graphics
Graphics
Asynchronous Pan/Zoom	none
Device ID	0x0126
GPU Accelerated Windows	0/1 Basic (OMTC)
Supports Hardware H264 Decoding	No;
Vendor ID	0x8086
WebGL Renderer	ATI Technologies Inc. -- AMD Radeon HD 6770M OpenGL Engine
windowLayerManagerRemote	true
AzureCanvasBackend	skia
AzureContentBackend	quartz
AzureFallbackCanvasBackend	none
AzureSkiaAccelerated	1
Depends on: 1241139
I have a good idea of what's going on now.

It looks like the MacIOSurfaceTextureSourceBasic, which was added in bug 942358, just doesn't work with video. It creates a SourceSurface with stride == width and SurfaceFormat::RGBX. (The surface's real format is SurfaceFormat::NV12.) That's bound to lead to memory corruption - SurfaceFormat::RGBX is expected to use four bytes per pixel, not one.

So why couldn't I reproduce the bug on my machine? Apparently, it's because at some point in the past I installed ffmpeg, in such a way that Firefox picks up the library and prefers the ffmpeg decoder over the Apple decoder. Only the Apple decoder will create IOSurface-backed video frames; the ffmpeg decoder uses DataTextureSourceBasic.
Blocks: 942358
Flags: needinfo?(jseward)
I've filed bug 1241665 for fixing this. With those patches I no longer see any crashes, neither on my machine (with ffmpeg disabled) nor on the bughunter 009 machine.
I manually tested the url with a current opt Nightly on OSX 10.6 and ran the url through bughunter twice and could not reproduce any crashes. Calling this fixed by bug 1241665.

I'm unsure of the scope of where this bug exists. My regression range did not include bug 942358 which has already merged to mozilla-aurora.

Does bug 1241665 need to be merged to Aurora or is bug 1187322 necessary to trigger the corruption?
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(mstange)
Resolution: --- → FIXED
Whiteboard: [fixed-by-bug-1241665]
It's a little complicated because Mac BasicCompositor has been disabled and un-disabled accidentally a few times in the past; the buggy code has been in the tree for a while but just wasn't used.
Just before bug 1187322, BasicCompositor was completely inaccessible on OS X, but that bug fixed things so that BasicCompositor on OS X is now always used when hardware acceleration is off. So yes, we only really need to uplift this patch to branches that get bug 1187322 uplifted to them.
Flags: needinfo?(mstange)
Group: gfx-core-security → core-security-release
From what I can tell, bug 1187322 is on 45, and so is bug 1241665 which fixes the problem, so we should be on the correct trains.
Bob, can you confirm that Beta/Aurora/Trunk are looking good these days?
Assignee: nobody → mstange
Flags: needinfo?(bob)
Target Milestone: --- → mozilla46
tested Firefox, Beta, Aurora, Nightly with default layers.acceleration.disabled for opt, debug on osx 10.6; debug on osx 10.8; layers.acceleration.disabled true for opt and true, false for debug on osx 10.9. seems fine to me.
Status: RESOLVED → VERIFIED
Flags: needinfo?(bob)
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.