Closed
Bug 1067018
Opened 10 years ago
Closed 10 years ago
crash in mozilla::gfx::SourceSurfaceAlignedRawData::InitWithStride(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::SurfaceFormat, int, bool)
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
VERIFIED
FIXED
mozilla35
Tracking | Status | |
---|---|---|
firefox32 | --- | unaffected |
firefox33 | + | verified |
firefox34 | + | verified |
firefox35 | + | verified |
b2g-v2.0 | --- | unaffected |
b2g-v2.0M | --- | unaffected |
b2g-v2.1 | --- | fixed |
b2g-v2.2 | --- | fixed |
fennec | 33+ | --- |
People
(Reporter: kbrosnan, Assigned: jchen)
References
Details
(Keywords: crash, topcrash-android-armv7)
Crash Data
Attachments
(2 files, 1 obsolete file)
5.67 KB,
patch
|
glandium
:
review+
|
Details | Diff | Splinter Review |
1.80 KB,
patch
|
jwatt
:
review+
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
[Tracking Requested - why for this release]: topcrash regression
This bug was filed from the Socorro interface and is
report bp-cdde3447-cb4b-4dd3-aad0-11edc2140910.
=============================================================
Samsung GT-I9500 (Galaxy S4) is responsible for over half the crashes for this bug.
Updated•10 years ago
|
Updated•10 years ago
|
Assignee: nobody → milan
tracking-fennec: ? → 33+
Comment 2•10 years ago
|
||
Assignee: milan → jwatt
Attachment #8493105 -
Flags: review?(milan)
Updated•10 years ago
|
Attachment #8493105 -
Flags: review?(milan) → review+
Updated•10 years ago
|
Flags: needinfo?(milan)
Comment 3•10 years ago
|
||
Comment on attachment 8493105 [details] [diff] [review]
patch
Actually, this can't be the problem. I was thinking that this class was one of the classes for which we override |operator new| to be fallible, but that's not the case.
Attachment #8493105 -
Attachment is obsolete: true
Comment 4•10 years ago
|
||
(I spun an opt build with debug symbols for the Nexus 5 and can confirm that we do indeed use mozalloc.h's |operator new(size_t size) MOZALLOC_THROW_BAD_ALLOC| for the |new SourceSurfaceAlignedRawData()| line.)
Comment 5•10 years ago
|
||
So we seem to have the stack:
AlignedArray<uint8_t,16>::Realloc
http://hg.mozilla.org/releases/mozilla-beta/annotate/9502b0a5c5f1/gfx/2d/Tools.h#l141
SourceSurfaceAlignedRawData::InitWithStride
http://hg.mozilla.org/releases/mozilla-beta/annotate/9502b0a5c5f1/gfx/2d/SourceSurfaceRawData.cpp#l64
Factory::CreateDataSourceSurfaceWithStride
http://hg.mozilla.org/releases/mozilla-beta/annotate/9502b0a5c5f1/gfx/2d/Factory.cpp#l718
Given comment 4, I don't see anything wrong with Factory::CreateDataSourceSurfaceWithStride. Nor do I see anything wrong with SourceSurfaceAlignedRawData::InitWithStride.
The only thing that I can see that is on the surface a bit suspicious with AlignedArray::Realloc is that we're using CheckedInt32 (instead of CheckedInt<size_t>, because that doesn't work, and because we want to keep the value below the max positive value that uint32_t can represent), so when we have:
mStorage = new (std::nothrow) uint8_t[storageByteCount.value()];
we're coercing the value returned from storageByteCount.value() from int32_t to size_t. I don't see how that can cause this crash though. We know that the value in storageByteCount is positive, so it must be convertible to size_t without rounding or overflow. On the crashing line we then pass this value to mozalloc.h's |operator new []|, which then passes it to moz_malloc, which then passes it to malloc (size_t all the way through). The only thing that can go wrong is that the requested about of memory can't be malloc'ed and we return nullptr up the stack.
Even _if_ the value passed through to malloc was zero (it can't be), and even _if_ malloc returned non-nullptr in that case, we shouldn't crash on the line that is being blamed by the crash reports. So I'm a bit stumped here.
Jim, do you have any ideas or insights?
Flags: needinfo?(nchen)
Comment 7•10 years ago
|
||
(In reply to Jonathan Watt [:jwatt] from comment #5)
> positive value that uint32_t can represent
(Sorry, I meant max positive value that *int32_t* can represent.
Comment 9•10 years ago
|
||
Updating for new post-bug 1063733 InitWithStride signature.
Crash Signature: [@ mozilla::gfx::SourceSurfaceAlignedRawData::InitWithStride(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::SurfaceFormat, int)] → [@ mozilla::gfx::SourceSurfaceAlignedRawData::InitWithStride(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::SurfaceFormat, int, bool)]
Summary: crash in mozilla::gfx::SourceSurfaceAlignedRawData::InitWithStride(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::SurfaceFormat, int) → crash in mozilla::gfx::SourceSurfaceAlignedRawData::InitWithStride(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::SurfaceFormat, int, bool)
Assignee | ||
Comment 11•10 years ago
|
||
Looks like the crash happens inside the new statement at [1], which consists of calling moz_malloc to allocate memory [2] and then initializing the memory to zero. The problem is the initialization step in the generated code doesn't check for nullptr result, and we crash trying to initialize a block of memory starting at 0x00000000. I'm not sure if this is a compiler bug or something with our code. I'm not even sure the memory should be initialized to zero [3]. The same code in my local build using GCC 4.9 doesn't do initialization.
[1] http://mxr.mozilla.org/mozilla-central/source/gfx/2d/Tools.h#157
[2] http://mxr.mozilla.org/mozilla-central/source/memory/mozalloc/mozalloc.h?rev=8252eae8278c#216
[3] http://stackoverflow.com/a/7546745
Flags: needinfo?(nchen)
Comment 12•10 years ago
|
||
I wouldn't get *too* caught up in the details here wrt zeroing. If moz_malloc() indeed failed it means we're gonna crash soon anyway, because we aren't hardened against that.
If I had to guess, we are asking for a crazy amount of memory. That has happened before with these sorts of allocations in gfx.
Comment 13•10 years ago
|
||
jim and snorp seem to be on top of this. I took a quick look as well but nothing jumped out at me.
Flags: needinfo?(bugmail.mozilla)
Comment 14•10 years ago
|
||
Jim: My understanding is that the code should not be zeroing. The fact that more recent versions of GCC do not zero would seem to indicate that the older GCC (4.7.0 apparently) that we use on the build machines either had intended behavior that was later changed, or else was buggy.
I guess we could be setting some defines/compiler flags somewhere to flip on zeroing, but the fact that the crash also exists in Nightly builds seems to make that unlikely:
https://crash-stats.mozilla.com/topcrasher/products/FennecAndroid/versions/35.0a1
Do you get the zeroing in the generated code for that file if you compile it with 4.7?
Assuming the compiler is the issue, where do we go from here? Besides causing this crash, presumably this unrequested zeroing of arrays is a perf issue. But who knows what bugs the zeroing may be hiding elsewhere, making updating the compiler for beta at this stage somewhat risky.
Comment 15•10 years ago
|
||
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #12)
> I wouldn't get *too* caught up in the details here wrt zeroing. If
> moz_malloc() indeed failed it means we're gonna crash soon anyway, because
> we aren't hardened against that.
Not necessarily if this is a big allocation, which it probably is. Bug 1067998 for example was really easy to repro, but after fixing it I can't get fennec to OOM even after opening loads of tabs containing the crashing page.
Comment 16•10 years ago
|
||
Indeed, it looks we can recover from this if we fix the zeroing issue.
Comment 17•10 years ago
|
||
Jim, I know you were looking further into the zeroing issue. What's the status there?
Flags: needinfo?(nchen)
Assignee | ||
Comment 18•10 years ago
|
||
So to summarize the issue, when you allocate an array of built-in type in C++, you can tell the compiler to initialize the memory to zero by using,
> int* mem = new int[10]();
Or you can tell it to *not* initialize by using,
> int* mem = new int[10];
However, the version of GCC used by the Android buildbots (4.7) apparently treats the two the same way -- the memory is always initialized to zero.
That by itself should not cause this crash; it's just potentially inefficient. The crash is caused by the fact that the Android NDK has a separate bug in its headers. Its headers declare the nothrow placement new operator as,
> void* operator new[](size_t size, const std::nothrow_t&)
Whereas the correct declaration is
> void* operator new[](size_t size, const std::nothrow_t&) throw()
The 'throw()' part is important, because the point of using nothrow is to say that if the allocation fails, instead of throwing an exception (or crashing), NULL is returned.
When you bring zero-initialization into the equation, you can see that when the compiler is doing zero-initialization, it must check for NULL when using nothrow. GCC only adds NULL-checking to its generated initialization code if we specify 'throw()' as part of the new operator declaration. Because the NDK headers don't specify that, GCC erroneously assumes the nothrow placement new operator will never return NULL, and when it does return NULL in real life, the code tries to initialize the NULL memory block to zero, causing this crash.
Flags: needinfo?(nchen)
Assignee | ||
Comment 19•10 years ago
|
||
This patch overrides the NDK <new> header with our own copy. Our copy correctly specifies "throw()" and makes GCC correctly perform null-checks.
Attachment #8494703 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 20•10 years ago
|
||
Making sure allocation/deallocation all match.
Attachment #8494743 -
Flags: review?(jwatt)
Assignee | ||
Comment 21•10 years ago
|
||
Actually, either patch should fix this crash.
Comment 23•10 years ago
|
||
Comment on attachment 8494743 [details] [diff] [review]
Make sure calloc/malloc/free usages match in Tools.h (v1)
(In reply to Jim Chen [:jchen :nchen] from comment #21)
> Actually, either patch should fix this crash.
Seems like we should definitely take the throw() patch on branches since, although this patch fixes this crash, there's a good chance there are others that only the throw() patch will fix.
Attachment #8494743 -
Flags: review?(jwatt) → review+
Comment 24•10 years ago
|
||
Jim, can you land that asap so that we can have it in aurora & beta soon? Thanks
Flags: needinfo?(nchen)
Assignee | ||
Comment 25•10 years ago
|
||
I pushed one of the patches since the two are separate and either one fixes the bug,
https://hg.mozilla.org/integration/mozilla-inbound/rev/1af55e86f417
This patch also has lower risk for uplifting.
Comment 26•10 years ago
|
||
Assignee | ||
Comment 27•10 years ago
|
||
Comment on attachment 8494743 [details] [diff] [review]
Make sure calloc/malloc/free usages match in Tools.h (v1)
Approval Request Comment
[Feature/regressing bug #]: N/A
[User impact if declined]: Top crash on Android
[Describe test coverage new/current, TBPL]: Locally, m-c
[Risks and why]: Very small; no change in functionality
[String/UUID change made/needed]: None
Attachment #8494743 -
Flags: approval-mozilla-beta?
Attachment #8494743 -
Flags: approval-mozilla-aurora?
Comment 28•10 years ago
|
||
Comment on attachment 8494703 [details] [diff] [review]
Always specify throw() for nothrow placement new/delete (v1)
Review of attachment 8494703 [details] [diff] [review]:
-----------------------------------------------------------------
Please add a note in build/stlport/README.mozilla that overrides/ is a mozilla thing.
::: build/stlport/overrides/new
@@ +27,5 @@
> + * SUCH DAMAGE.
> + */
> +/*
> + * This header is taken from $ndk/sources/cxx-stl/system/include/new,
> + * and it fixes a bug in the NDK header where the nothrow versions of
Did you report this upstream? They sure should fix it.
Attachment #8494703 -
Flags: review?(mh+mozilla) → review+
Updated•10 years ago
|
Attachment #8494743 -
Flags: approval-mozilla-beta?
Attachment #8494743 -
Flags: approval-mozilla-beta+
Attachment #8494743 -
Flags: approval-mozilla-aurora?
Attachment #8494743 -
Flags: approval-mozilla-aurora+
Updated•10 years ago
|
Comment 29•10 years ago
|
||
Comment 30•10 years ago
|
||
Embedlite tests stopped compiling after these changes
In file included from ../../../dist/include/mozilla/gfx/Rect.h:12:0,
from ../../../dist/include/Units.h:12,
from ../../../dist/include/InputData.h:12,
from ../../../dist/include/mozilla/embedlite/EmbedLiteView.h:13,
/mozilla-central/embedding/embedlite/tests/embedLiteViewInitTest.cpp:8:
../../../dist/include/mozilla/gfx/Tools.h: In member function 'void mozilla::gfx::AlignedArray<T, alignment>::Dealloc()':
../../../dist/include/mozilla/gfx/Tools.h:136:22: error: there are no arguments to 'moz_free' that depend on a template parameter, so a declaration of 'moz_free' must be available [-fpermissive]
../../../dist/include/mozilla/gfx/Tools.h:136:22: note: (if you use '-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated)
../../../dist/include/mozilla/gfx/Tools.h: In member function 'void mozilla::gfx::AlignedArray<T, alignment>::Realloc(size_t, bool)':
../../../dist/include/mozilla/gfx/Tools.h:143:22: error: there are no arguments to 'moz_free' that depend on a template parameter, so a declaration of 'moz_free' must be available [-fpermissive]
../../../dist/include/mozilla/gfx/Tools.h:157:79: error: there are no arguments to 'moz_calloc' that depend on a template parameter, so a declaration of 'moz_calloc' must be available [-fpermissive]
../../../dist/include/mozilla/gfx/Tools.h:159:76: error: there are no arguments to 'moz_malloc' that depend on a template parameter, so a declaration of 'moz_malloc' must be available [-fpermissive]
make[1]: *** [embedLiteViewInitTest.o] Error 1
Adding #include "mozilla/mozalloc.h" did not help to resolve the issue.
Any ideas what could be wrong ?
here is our moz.build https://github.com/tmeshkova/gecko-dev/blob/embedlite/embedding/embedlite/tests/moz.build
Comment 31•10 years ago
|
||
Comment 32•10 years ago
|
||
Does this need to be open still? Also, does this affect B2G 2.0 (b2g32)?
status-b2g-v2.0:
--- → ?
status-b2g-v2.1:
--- → fixed
status-b2g-v2.2:
--- → fixed
Flags: needinfo?(nchen)
Assignee | ||
Comment 33•10 years ago
|
||
(In reply to Oleg Romashin (:romaxa) from comment #30)
> Embedlite tests stopped compiling after these changes
>
> Adding #include "mozilla/mozalloc.h" did not help to resolve the issue.
> Any ideas what could be wrong ?
> here is our moz.build
> https://github.com/tmeshkova/gecko-dev/blob/embedlite/embedding/embedlite/
> tests/moz.build
Hmm I have no idea. I guess you can print out the preprocessed source (replace -c in the GCC command line with -E) and see why the functions are not included. Maybe try declaring the functions locally as a sanity check?
Keywords: leave-open
Assignee | ||
Comment 34•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #32)
> Does this need to be open still? Also, does this affect B2G 2.0 (b2g32)?
Guess not. :mwu said B2G is not affected.
Flags: needinfo?(nchen)
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
status-b2g-v2.0M:
--- → unaffected
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Comment 35•10 years ago
|
||
I'm seeing 0 reports of new crashes in the last week with Fennec 35.0a1, 34.0a2, or 33.0b8. Marking this bug verified fixed based on crashstats.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•