Closed Bug 1044193 Opened 10 years ago Closed 10 years ago

crash in gray_rgb_convert mostly on Android ARM

Categories

(Core :: Graphics: ImageLib, defect)

All
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla34
Tracking Status
firefox31 --- wontfix
firefox32 + fixed
firefox33 + fixed
firefox34 --- fixed
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed
fennec + ---

People

(Reporter: kbrosnan, Assigned: mwu)

References

()

Details

(Keywords: crash, regression, reproducible)

Crash Data

Attachments

(2 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-aa2a8597-c144-4aaf-869e-097d52140711.
=============================================================

I am able to reproduce this crash by opening the following page http://www.retronaut.com/2013/06/runnymede-eggs-for-graf-zeppelin/?utm_content=buffer614db&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer on a Nexus 4.

https://crash-stats.mozilla.com/report/list?signature=gray_rgb_convert&range_value=7&range_unit=days

0	libxul.so	gray_rgb_convert	media/libjpeg/jdcolext.c
1	libxul.so	sep_upsample	media/libjpeg/jdsample.c
2	libxul.so	process_data_simple_main	media/libjpeg/jdmainct.c
3	libxul.so	jpeg_read_scanlines	media/libjpeg/jdapistd.c
4	libxul.so	mozilla::image::nsJPEGDecoder::OutputScanlines(bool*)	image/decoders/nsJPEGDecoder.cpp
5	libxul.so	mozilla::image::nsJPEGDecoder::WriteInternal(char const*, unsigned int, mozilla::image::DecodeStrategy)	image/decoders/nsJPEGDecoder.cpp
6	libxul.so	mozilla::image::Decoder::Write(char const*, unsigned int, mozilla::image::DecodeStrategy)	image/src/Decoder.cpp
7	libxul.so	mozilla::image::RasterImage::WriteToDecoder(char const*, unsigned int, mozilla::image::DecodeStrategy)	image/src/RasterImage.cpp
8	libxul.so	mozilla::image::RasterImage::DecodePool::DecodeSomeOfImage(mozilla::image::RasterImage*, mozilla::image::DecodeStrategy, mozilla::image::RasterImage::DecodePool::DecodeType, unsigned int)	image/src/RasterImage.cpp
9	libxul.so	mozilla::image::RasterImage::DecodePool::DecodeJob::Run()	image/src/RasterImage.cpp
10	libxul.so	nsThreadPool::Run()	xpcom/threads/nsThreadPool.cpp
11	libxul.so	nsThread::ProcessNextEvent(bool, bool*)	xpcom/threads/nsThread.cpp
12	libxul.so	NS_ProcessNextEvent(nsIThread*, bool)	xpcom/glue/nsThreadUtils.cpp
13	libxul.so	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*)	ipc/glue/MessagePump.cpp
14	libxul.so	MessageLoop::RunInternal()	ipc/chromium/src/base/message_loop.cc
15	libxul.so	MessageLoop::Run()	ipc/chromium/src/base/message_loop.cc
16	libxul.so	nsThread::ThreadFunc(void*)	xpcom/threads/nsThread.cpp
17	libnss3.so	_pt_root	nsprpub/pr/src/pthreads/ptthread.c
Milan, reproducible crash needs an assignee
tracking-fennec: ? → +
Flags: needinfo?(milan)
libjpeg was updated in 31, but this crash predates that.  I imagine it'd be somewhere in our code anyway.
Also repros on my Nexus 10.
Attached file assert log
I get an assert different from the crash location, at least on my Nexus 10:
0x8298d1f4 in mozilla::image::CreateLockedSurface (vbuf=0x945f62e0, size=..., format=mozilla::gfx::B8G8R8A8)
    at /media/jgilbert/fast-ext4/dev/mozilla/central/image/src/imgFrame.cpp:55

GDB log is attached above.
I don't have knowledge of imgFrame. Maybe :seth will know if this is bad?
Flags: needinfo?(seth)
Michael, looks like Jeff is seeing an assert in code related to volatile buffers in imgFrame. (See comment 5) Could you take a look?
Flags: needinfo?(seth) → needinfo?(mwu)
Are these the volatile buffers for decoded images?  If that's the case, Android is the only platform where we currently use that, having run into some regressions on other platforms (see bug 1046277 for a history of bugs that led us there), and it may be worth disabling this on Android as well, and fixing it together on all platforms?

Jeff, since you can reproduce this, could you try making imgFrame::SetDiscardable() a no-op on Android as well, and see if that stops the crash?
Flags: needinfo?(milan) → needinfo?(jgilbert)
I think I know what's going on. This looks like a problem with the ashmem implementation of VolatileBuffer - it doesn't fall back on normal allocations properly when the kernel refuses to mmap a huge ashmem buffer.
Flags: needinfo?(mwu)
Guess that makes sense - we're trying to allocate 250+MB for an image here..
(In reply to Michael Wu [:mwu] from comment #10)
> I think I know what's going on. This looks like a problem with the ashmem
> implementation of VolatileBuffer - it doesn't fall back on normal
> allocations properly when the kernel refuses to mmap a huge ashmem buffer.

Nice catch.
Assignee: nobody → mwu
Flags: needinfo?(jgilbert)
Going to send to try and then test on device.
(In reply to Michael Wu [:mwu] from comment #13)
> Created attachment 8466476 [details] [diff] [review]
> Always close FDs on heap allocation path
> 
> Going to send to try and then test on device.

Nice catch
I actually managed to load parts of the page with this. It doesn't always work, but it doesn't crash 100% now. This page requires about a gigabyte or more of memory to load all images successfully and I'm not sure jemalloc/Android is able to give us that on these 2GB ram devices.
Attachment #8466476 - Attachment is obsolete: true
Attachment #8467501 - Flags: review?(mh+mozilla)
Attachment #8467501 - Flags: review?(mh+mozilla) → review+
(In reply to Nigel Babu [:nigelb] from comment #17)
> Does this failure look related to the push?
> https://tbpl.mozilla.org/php/getParsedLog.php?id=45225017&tree=Mozilla-
> Inbound

Don't see a way this would've caused that.
https://hg.mozilla.org/mozilla-central/rev/c607db1a32d0
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Comment on attachment 8467501 [details] [diff] [review]
Always cleanup on heap allocation path

Approval Request Comment
[Feature/regressing bug #]: Regression from bug 962670.
[User impact if declined]: Pages with large images are more likely to crash.
[Describe test coverage new/current, TBPL]: Tests cover basic functionality of this code, but don't cover this low memory fallback path.
[Risks and why]: Minimal - fixes a fallback path used in low memory conditions.
[String/UUID change made/needed]: None
Attachment #8467501 - Flags: approval-mozilla-aurora?
Comment on attachment 8467501 [details] [diff] [review]
Always cleanup on heap allocation path

Michael, 32 seems affected too. Why didn't you requested an uplift for beta too?
Attachment #8467501 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Flags: needinfo?(mwu)
Comment on attachment 8467501 [details] [diff] [review]
Always cleanup on heap allocation path

Sure. This should also be good for beta.
Attachment #8467501 - Flags: approval-mozilla-beta?
Flags: needinfo?(mwu)
Comment on attachment 8467501 [details] [diff] [review]
Always cleanup on heap allocation path

Beta+
Attachment #8467501 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
[Blocking Requested - why for this release]: This is a general stability fix that's going into Firefox 32 that should also help with B2G.
blocking-b2g: --- → 2.0?
blocking-b2g: 2.0? → ---
Flags: qe-verify+
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: