crash in gray_rgb_convert mostly on Android ARM

RESOLVED FIXED in Firefox 32

Status

()

--
critical
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: kbrosnan, Assigned: mwu)

Tracking

({crash, regression, reproducible})

Trunk
mozilla34
All
Android
crash, regression, reproducible
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify +

Firefox Tracking Flags

(firefox31 wontfix, firefox32+ fixed, firefox33+ fixed, firefox34 fixed, b2g-v2.0 fixed, b2g-v2.1 fixed, fennec+)

Details

(crash signature, URL)

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

5 years ago
This bug was filed from the Socorro interface and is 
report bp-aa2a8597-c144-4aaf-869e-097d52140711.
=============================================================

I am able to reproduce this crash by opening the following page http://www.retronaut.com/2013/06/runnymede-eggs-for-graf-zeppelin/?utm_content=buffer614db&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer on a Nexus 4.

https://crash-stats.mozilla.com/report/list?signature=gray_rgb_convert&range_value=7&range_unit=days

0	libxul.so	gray_rgb_convert	media/libjpeg/jdcolext.c
1	libxul.so	sep_upsample	media/libjpeg/jdsample.c
2	libxul.so	process_data_simple_main	media/libjpeg/jdmainct.c
3	libxul.so	jpeg_read_scanlines	media/libjpeg/jdapistd.c
4	libxul.so	mozilla::image::nsJPEGDecoder::OutputScanlines(bool*)	image/decoders/nsJPEGDecoder.cpp
5	libxul.so	mozilla::image::nsJPEGDecoder::WriteInternal(char const*, unsigned int, mozilla::image::DecodeStrategy)	image/decoders/nsJPEGDecoder.cpp
6	libxul.so	mozilla::image::Decoder::Write(char const*, unsigned int, mozilla::image::DecodeStrategy)	image/src/Decoder.cpp
7	libxul.so	mozilla::image::RasterImage::WriteToDecoder(char const*, unsigned int, mozilla::image::DecodeStrategy)	image/src/RasterImage.cpp
8	libxul.so	mozilla::image::RasterImage::DecodePool::DecodeSomeOfImage(mozilla::image::RasterImage*, mozilla::image::DecodeStrategy, mozilla::image::RasterImage::DecodePool::DecodeType, unsigned int)	image/src/RasterImage.cpp
9	libxul.so	mozilla::image::RasterImage::DecodePool::DecodeJob::Run()	image/src/RasterImage.cpp
10	libxul.so	nsThreadPool::Run()	xpcom/threads/nsThreadPool.cpp
11	libxul.so	nsThread::ProcessNextEvent(bool, bool*)	xpcom/threads/nsThread.cpp
12	libxul.so	NS_ProcessNextEvent(nsIThread*, bool)	xpcom/glue/nsThreadUtils.cpp
13	libxul.so	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*)	ipc/glue/MessagePump.cpp
14	libxul.so	MessageLoop::RunInternal()	ipc/chromium/src/base/message_loop.cc
15	libxul.so	MessageLoop::Run()	ipc/chromium/src/base/message_loop.cc
16	libxul.so	nsThread::ThreadFunc(void*)	xpcom/threads/nsThread.cpp
17	libnss3.so	_pt_root	nsprpub/pr/src/pthreads/ptthread.c
Milan, reproducible crash needs an assignee
tracking-fennec: ? → +
Flags: needinfo?(milan)
libjpeg was updated in 31, but this crash predates that.  I imagine it'd be somewhere in our code anyway.
Also repros on my Nexus 10.
Posted file assert log
I get an assert different from the crash location, at least on my Nexus 10:
0x8298d1f4 in mozilla::image::CreateLockedSurface (vbuf=0x945f62e0, size=..., format=mozilla::gfx::B8G8R8A8)
    at /media/jgilbert/fast-ext4/dev/mozilla/central/image/src/imgFrame.cpp:55

GDB log is attached above.
I don't have knowledge of imgFrame. Maybe :seth will know if this is bad?
Flags: needinfo?(seth)
Michael, looks like Jeff is seeing an assert in code related to volatile buffers in imgFrame. (See comment 5) Could you take a look?
Flags: needinfo?(seth) → needinfo?(mwu)
Are these the volatile buffers for decoded images?  If that's the case, Android is the only platform where we currently use that, having run into some regressions on other platforms (see bug 1046277 for a history of bugs that led us there), and it may be worth disabling this on Android as well, and fixing it together on all platforms?

Jeff, since you can reproduce this, could you try making imgFrame::SetDiscardable() a no-op on Android as well, and see if that stops the crash?
Flags: needinfo?(milan) → needinfo?(jgilbert)
(Assignee)

Comment 10

5 years ago
I think I know what's going on. This looks like a problem with the ashmem implementation of VolatileBuffer - it doesn't fall back on normal allocations properly when the kernel refuses to mmap a huge ashmem buffer.
Flags: needinfo?(mwu)
(Assignee)

Comment 11

5 years ago
Guess that makes sense - we're trying to allocate 250+MB for an image here..
(In reply to Michael Wu [:mwu] from comment #10)
> I think I know what's going on. This looks like a problem with the ashmem
> implementation of VolatileBuffer - it doesn't fall back on normal
> allocations properly when the kernel refuses to mmap a huge ashmem buffer.

Nice catch.
Assignee: nobody → mwu
Flags: needinfo?(jgilbert)
(Assignee)

Comment 13

5 years ago
Going to send to try and then test on device.
(In reply to Michael Wu [:mwu] from comment #13)
> Created attachment 8466476 [details] [diff] [review]
> Always close FDs on heap allocation path
> 
> Going to send to try and then test on device.

Nice catch
(Assignee)

Comment 15

5 years ago
I actually managed to load parts of the page with this. It doesn't always work, but it doesn't crash 100% now. This page requires about a gigabyte or more of memory to load all images successfully and I'm not sure jemalloc/Android is able to give us that on these 2GB ram devices.
Attachment #8466476 - Attachment is obsolete: true
(Assignee)

Updated

5 years ago
Attachment #8467501 - Flags: review?(mh+mozilla)
Attachment #8467501 - Flags: review?(mh+mozilla) → review+
(Assignee)

Comment 18

5 years ago
(In reply to Nigel Babu [:nigelb] from comment #17)
> Does this failure look related to the push?
> https://tbpl.mozilla.org/php/getParsedLog.php?id=45225017&tree=Mozilla-
> Inbound

Don't see a way this would've caused that.
https://hg.mozilla.org/mozilla-central/rev/c607db1a32d0
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
(Assignee)

Comment 20

5 years ago
Comment on attachment 8467501 [details] [diff] [review]
Always cleanup on heap allocation path

Approval Request Comment
[Feature/regressing bug #]: Regression from bug 962670.
[User impact if declined]: Pages with large images are more likely to crash.
[Describe test coverage new/current, TBPL]: Tests cover basic functionality of this code, but don't cover this low memory fallback path.
[Risks and why]: Minimal - fixes a fallback path used in low memory conditions.
[String/UUID change made/needed]: None
Attachment #8467501 - Flags: approval-mozilla-aurora?
(Reporter)

Updated

5 years ago
Blocks: 962670
status-firefox31: --- → affected
status-firefox32: --- → affected
status-firefox33: --- → affected
status-firefox34: --- → fixed
tracking-firefox32: --- → ?
tracking-firefox33: --- → ?
Keywords: regression
(Assignee)

Updated

5 years ago
status-b2g-v2.0: --- → affected
status-b2g-v2.1: --- → fixed
status-firefox31: affected → wontfix
tracking-firefox32: ? → +
tracking-firefox33: ? → +
Comment on attachment 8467501 [details] [diff] [review]
Always cleanup on heap allocation path

Michael, 32 seems affected too. Why didn't you requested an uplift for beta too?
Attachment #8467501 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Flags: needinfo?(mwu)
(Assignee)

Comment 22

5 years ago
Comment on attachment 8467501 [details] [diff] [review]
Always cleanup on heap allocation path

Sure. This should also be good for beta.
Attachment #8467501 - Flags: approval-mozilla-beta?
Flags: needinfo?(mwu)
Comment on attachment 8467501 [details] [diff] [review]
Always cleanup on heap allocation path

Beta+
Attachment #8467501 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(Assignee)

Comment 25

5 years ago
[Blocking Requested - why for this release]: This is a general stability fix that's going into Firefox 32 that should also help with B2G.
blocking-b2g: --- → 2.0?
(Assignee)

Updated

5 years ago
blocking-b2g: 2.0? → ---
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.