intermittent autophone-s1s2 | application crashed [@ ElfLoader::~ElfLoader]
Categories
(Firefox for Android Graveyard :: Toolbar, defect, P5)
Tracking
(firefox41 wontfix, firefox42 wontfix, firefox43+ wontfix, firefox44+ wontfix, fennec+, firefox52- wontfix, firefox53- wontfix, firefox54+ wontfix, firefox55+ wontfix, firefox56 affected, firefox57 affected)
People
(Reporter: csuciu, Assigned: jnicol)
References
Details
(4 keywords, Whiteboard: [gfx-noted])
Crash Data
Attachments
(7 files)
2.77 MB,
application/gzip
|
Details | |
1.81 MB,
application/gzip
|
Details | |
302.41 KB,
application/gzip
|
Details | |
201.37 KB,
application/gzip
|
Details | |
1.91 KB,
patch
|
Margaret
:
review+
|
Details | Diff | Splinter Review |
58 bytes,
text/x-review-board-request
|
kats
:
review+
snorp
:
review+
|
Details |
1.75 KB,
patch
|
jrmuizel
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-090e99ed-c220-470b-be34-91be62150512. ============================================================= Same signature as in bug #1152308 This crash occurs when loading www.ziare.com. Not always reproducible but happens pretty often. From logs: W/Adreno-GSL(25612): <sharedmem_gpumem_alloc_id:1489>: sharedmem_gpumem_alloc: mmap failed errno 12 Out of memory E/Adreno-GSL(25612): <ioctl_kgsl_sharedmem_alloc:1590>: ioctl_kgsl_sharedmem_alloc: FATAL ERROR : (null) D/InputReader( 840): Input event: value=1 when=208441729476000 Frame Module Signature Source 0 libmozglue.so ElfLoader::~ElfLoader() mozglue/linker/ElfLoader.cpp Ø 1 libc.so libc.so@0x28be5 2 libmozglue.so libmozglue.so@0x25f03 3 libmozglue.so _Unwind_GetTextRelBase 4 libmozglue.so _Unwind_GetTextRelBase Ø 5 libgsl.so libgsl.so@0x1c546 Ø 6 libgsl.so libgsl.so@0x1d4dd Ø 7 libgsl.so libgsl.so@0x1ee9e Ø 8 libgsl.so libgsl.so@0x1ee9e Ø 9 libc.so libc.so@0x28ddf Ø 10 libgsl.so libgsl.so@0x1037b Ø 11 libgsl.so libgsl.so@0x1d683 Ø 12 libgsl.so libgsl.so@0x1c546 Ø 13 libgsl.so libgsl.so@0x1d683 Ø 14 libgsl.so libgsl.so@0x1fd0e Ø 15 libgsl.so libgsl.so@0x5959 Ø 16 libGLESv2_adreno.so libGLESv2_adreno.so@0x7bb13
Comment 1•9 years ago
|
||
This signature is a topcrash.
Reporter | ||
Comment 2•9 years ago
|
||
From what I'm seeing in the reports, this crash happens mostly (if not entirely) on devices with Adreno GPUs
Updated•9 years ago
|
Updated•9 years ago
|
I think this is another incarnation of the GL leak that we seem to be seeing. Similar to bug 1177421?
Kevin, it may help Jamie if we can get some URLs for this crash
comment #4
Reporter | ||
Comment 6•9 years ago
|
||
This crash is reproducible on some devices when playing videos on Youtube desktop site. Sometimes the crash will occur immediately after starting to play a video, sometimes I have to wait a bit,maybe play a second video, but eventually Firefox will crash. Build: latest Nightly Devices: Samsung Galaxy Note 3 (4.4.2) https://crash-stats.mozilla.com/report/index/1bf2d407-735c-4797-b387-008842150713 Sony Xperia Z2 (4.4.4) https://crash-stats.mozilla.com/report/index/5c3981f1-4143-4f04-a179-3a31a2150713
Updated•9 years ago
|
Reporter | ||
Comment 8•9 years ago
|
||
Nightly crashes with this signature every time I visit http://lifeinthegrid.com/simple-css-fixed-header/ (http://goo.gl/WlqOMD) on Galaxy Note 3 (4.4.2)
Comment 9•9 years ago
|
||
for me it doesn't involve video. It's just a matter of scrolling up or down on long pages.
Comment 10•9 years ago
|
||
Can this be tracked for 43 ? using nightly is just too painfull - it crashes almost every cession I use my mobile to browse the web with.
Assignee | ||
Comment 11•9 years ago
|
||
Ludovic, if you set layers.tiles.adjust to false in about:config, does that make it crash less frequently? And does your device have a high-resolution screen?
Comment 12•9 years ago
|
||
[Tracking Requested - why for this release]: This is now the #1 in term of crash volume. It is probably too late for 42 but we should fix this bug in 43. Leaving the tracking request to be sure Liz sees it.
Comment 13•9 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #11) > Ludovic, if you set layers.tiles.adjust to false in about:config, does that > make it crash less frequently? And does your device have a high-resolution > screen? Yes for the high resolution. Done will repport.
Updated•9 years ago
|
Comment 14•9 years ago
|
||
(In reply to Ludovic Hirlimann [:Usul] from comment #10) > Can this be tracked for 43 ? using nightly is just too painfull - it crashes > almost every cession I use my mobile to browse the web with. Looks like Sylvestre took care of setting the right flags.
Comment 15•9 years ago
|
||
(In reply to Ludovic Hirlimann [:Usul] from comment #13) > Yes for the high resolution. Done will repport. Feels way less crashy - but it still crashes.
Comment 16•9 years ago
|
||
Tracking for 43+ since it sounds like this crash is affecting people despite the possible workaround in comment 11.
Comment 17•9 years ago
|
||
I have a Nexus 5. For me, it's not *just* length of page: it's long pages with lots of embedded media. I can find lots of pages that constantly fail after scrolling. I've stayed on Nightly 44.0a1 (20151020031317) because it was the first Nightly to trigger this crash constantly for me. I've just been too lazy / busy to dig in! How do I help?
Assignee | ||
Comment 18•9 years ago
|
||
I'm hopeful bug 1223479 will have helped with this. If people could try the latest nightly (2015-11-12) and see if things are any better that would be great.
Reporter | ||
Comment 19•9 years ago
|
||
I'm still able to reproduce the crash on latest Nightly (2015-11-12) using the URL/device from comment #8
Assignee | ||
Comment 20•9 years ago
|
||
Hmmm. The display port still looks very large for that page. And it's creating multiple painted layers, each of them using many tiles. Definitely still work to do so we don't crash here.
Comment 21•9 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #18) > I'm hopeful bug 1223479 will have helped with this. If people could try the > latest nightly (2015-11-12) and see if things are any better that would be > great. I still see this frequently with 20151113030248. I've found that http://www.kansascity.com/news/state/missouri/article44912667.html demonstrates this crash frequently.
Updated•8 years ago
|
Assignee | ||
Comment 22•8 years ago
|
||
Links to crashing pages are very helpful, thanks. The page mentioned in comment 21 layerizes absolutely insanely! Tens or possibly even a hundred painted layers - unsurprisingly uses a lot of memory, and the one from comment 8 also has several painted layers and a gigantic mask layer. Investigating why now.
Jamie it looks like we have layers.max-active set to 20 right now. That seems pretty high to me, but I also wonder if it's being enforced...
Updated•8 years ago
|
Assignee | ||
Comment 24•8 years ago
|
||
layers.max-active seems to only be honoured in the code path that uses PaintedLayerDataTree::AddingOwnLayer() to add a layer. The other branch, that uses PaintedLayerDataTree::FindPaintedLayerFor() does not honour it. And in this case we keep processing items with different animated geometry roots, which forces FindPaintedLayerFor to create a new layer for each AGR. Not sure why this page has so many AGRs. Or something like that. FrameLayerBuilder is confusing :/
Assignee | ||
Comment 25•8 years ago
|
||
For the page in comment 21: We are creating lots of layers because the page contains lots of items with animated offsets. Bug 1231818 has been filed for fixing this. This might help with various pages but will not be the end of our memory problems which is why I've filed it as a separate bug.
Comment 26•8 years ago
|
||
Too late for 43. Is this still something we need to keep open or track for 44+?
Comment 27•8 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #26) > Too late for 43. Is this still something we need to keep open or track for > 44+? top crash by a decent margin on 44, so I'd keep tracking it
Snorp, JNicol: This crash continues to be the top most issue on FennecAndroid according to crash-stats. Is there anything that we can do to address this? Do you want to consider uplifting the fix from bug 1231818 to Beta/Aurora to see if it address this top crash?
I was looking at Nightly 46.0a2 data to see if the patch from bug 1231818 help fix this crash. I can still see build 2016-01-10 on Nightly64 hitting this crash signature. :(
Assignee | ||
Comment 30•8 years ago
|
||
The patch from bug 1231818 will help with sites like the one in comment 21, where we were crashing due to creating tens or hundreds of small layers. Unfortunately I don't think this was that common a situation. So I would expect we're still hitting the signature as top crash, but if it has had any measurable reduction then it might be worthwhile uplifting.
Comment 31•8 years ago
|
||
the #15 top crash on 44b8 has a signature that first appeared on dec. 19 and seems to have a lot of the same code on the stack. #15 1.04% new libmozglue.so@0x23dd6 Browser Crash 70 0 0 0 7 2015-12-19 https://crash-stats.mozilla.com/signature/?product=FennecAndroid&version=44.0b8&date=%3C2016-01-16T17%3A19%3A21&date=%3E%3D2016-01-09T17%3A19%3A21&signature=libmozglue.so%400x23dd6&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#reports Same crash? If not I'll file another bug.
Comment 32•8 years ago
|
||
The ElfLoader crash has similar grapahics adapter correlation to the libmozglue.so crash, but maybe that's just a reflection of marketshare ElfLoader::~ElfLoader Qualcomm Adreno (TM) 330 17858 65.870% Qualcomm Adreno (TM) 320 5478 20.206% Qualcomm Adreno (TM) 305 2450 9.037% + larger assortment of small pct. for more adapters.. libmozglue.so@0x23dd6 Qualcomm Adreno (TM) 330 34 68.000% Qualcomm Adreno (TM) 320 15 30.000% Qualcomm Adreno (TM) 305 1 2.000%
Assignee | ||
Comment 33•8 years ago
|
||
I would be very confident they are the same OOM crash
Comment 34•8 years ago
|
||
Would an ADB log help , I think I can reproduce quite easily.
Assignee | ||
Comment 35•8 years ago
|
||
The adb log with layout.display-list.dump=true could be useful, along with the url you can crash on. Lots of this seems to me to be bad layerisation of certain websites. This is certainly the case for the urls in comment 8 and comment 21.
Comment 36•8 years ago
|
||
adb logcat with the dump set, crashing when loading scrolling at http://wiki.openstreetmap.org/wiki/FR:%C3%89l%C3%A9ments_cartographiques
Assignee | ||
Comment 37•8 years ago
|
||
Thanks Ludovic, that's very helpful. I can see in the dump that we have a couple of very large painted layers, each basically the size of the entire page. This means we use 2 display ports worth of tiles rather than just 1. It looks like instead of being flattened into a single layer, they are separated by a paintedlayer which is backface hidden, because there will have been a backface-hidden item in the middle of the display list. I've noticed the mobile wikipedia website does the same thing.
While this remains a top crash for Fennec, I doubt if we have a fix ready for Fx44, this is now a wontfix.
Updated•8 years ago
|
Assignee | ||
Comment 39•8 years ago
|
||
The problem described in comment 37 might be solvable. I've opened bug 1241078. Ludovic, when you reproduce on that website, is it only by scrolling vertically? Can you still reproduce using the build at https://archive.mozilla.org/pub/mobile/try-builds/jnicol@mozilla.com-504f631ad8803cb992a0ffaa7a4049aa12a28370/try-android-api-11/ ? (Horizontal scrolling in the tables is a different problem. When the table is scrolled it gets its own layer so both the table and the main page get a display port.)
Comment 40•8 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #39) > The problem described in comment 37 might be solvable. I've opened bug > 1241078. > > Ludovic, when you reproduce on that website, is it only by scrolling > vertically? Can you still reproduce using the build at > https://archive.mozilla.org/pub/mobile/try-builds/jnicol@mozilla.com- > 504f631ad8803cb992a0ffaa7a4049aa12a28370/try-android-api-11/ ? Yeah I'm only scrolling up and down to make ti crash. I don't think I intentionally scroll horizonatly. I got this with your build https://crash-stats.mozilla.com/report/index/ab0ef68a-3390-4287-b9ec-e27632160120
Assignee | ||
Comment 41•8 years ago
|
||
Could you attach a logcat with layers.displaylist.dump=true with that build? thanks.
Comment 42•8 years ago
|
||
Here you go.
Assignee | ||
Comment 43•8 years ago
|
||
Hmm, the log shows the patch has worked as intended: there is only one very large painted layer, not two. The crash report contains "Failed to allocate a TextureClient", but from the layer dump I can't see why we'd be allocating too many textures. Could you please try (logcat with display list dump) with this build: https://archive.mozilla.org/pub/mobile/try-builds/jnicol@mozilla.com-deb578f4b007952cbb6c7bb8e61ad26862cd67a7/try-android-api-11/ (the same as before but with TextureClientPool logging enabled.)
Comment 44•8 years ago
|
||
here you go
Comment 45•8 years ago
|
||
(In reply to Ludovic Hirlimann [:Usul] from comment #44) > Created attachment 8710090 [details] > bug3.txt.gz asked log > > here you go I got https://crash-stats.mozilla.com/report/index/89ffb58f-ca6d-4482-89ff-946932160120 this a signature for the crash in the log above.
Assignee | ||
Comment 46•8 years ago
|
||
> TexturePool 0x925ec480 giving new 0x93102970; size 0 outstanding 676
no wonder we're OOMing.
Your layer tree in bug3.txt is different to in bug2.txt. In bug3.txt there are 2 multi-tiled painted layers, whereas there's just the 1 in bug2.txt. I suppose that's just because we crashed while allocating tiles for the newer layer tree, before getting to dump the newer tree.
In bug3.txt the layers have valid regions of 1080x4864 and 1792x21248. 21248 looks too high to me, but that should still only be ~200 tiles.
Assignee | ||
Comment 47•8 years ago
|
||
Ludovic, I'm fairly certain that this is the problem I mentioned in comment 39 about scrollable subframes. Even when attempting to just scroll the main content vertically I think it is almost impossible not to accidentally scroll the table, causing the extra layer. I can't reproduce a crash, but I can reproduce the extra layer and hundreds of tiles. And you can see in the display-list dump that the items in that layer are indeed table elements. The table layer's valid region is definitely too large, however. It looks like we are perhaps rendering the entire layer rather than just a display port. I think we've found a bug here!
Assignee | ||
Comment 48•8 years ago
|
||
I have filed bug 1241917 about scrollable subframes having too large display ports.
Updated•8 years ago
|
We are now annotating crash reports with the current texture usage, and all of the crashes on Nightly with this signature have high usage -- usually above 150MB.
Comment 51•8 years ago
|
||
Found another website where I crash with this signature. adding a logcat from the latest nightly. I'll happily do more logcat with pref setted, If I'm told to. Can someone look at the log and tell me if they are worth something ?
(In reply to Ludovic Hirlimann [:Usul] from comment #52) > Anything interesting in the log from comment 51 ? I don't see any new information, but thanks! I think we are fairly confident at this point that our issues are mostly due to creation of too many layers -- certainly too much textures and backing surfaces. Jamie is working on a patch to limit that in bug 1247554, so you may want to follow along there as well.
Milan landed bug 1252929, which should greatly reduce the maximum damage we can do with layer creation. However, in the 3/6 Nightly I already see we have an instance of this crash[0] that shows OOM with a high number of textures. Bummer. [0] https://crash-stats.mozilla.com/report/index/591f6916-ec8c-4b8e-a50c-246e02160306
Comment 55•8 years ago
|
||
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #54) > Milan landed bug 1252929, which should greatly reduce the maximum damage we > can do with layer creation. However, in the 3/6 Nightly I already see we > have an instance of this crash[0] that shows OOM with a high number of > textures. Bummer. If I crash with one of these, what would be useful for you to capture ?
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #54) > Milan landed bug 1252929, which should greatly reduce the maximum damage we > can do with layer creation. However, in the 3/6 Nightly I already see we > have an instance of this crash[0] that shows OOM with a high number of > textures. Bummer. > > [0] > https://crash-stats.mozilla.com/report/index/591f6916-ec8c-4b8e-a50c- > 246e02160306 I was wrong, that patch had NOT landed yet. Another one on the bug did. The pref change is in inbound now, so hopefully it makes the next Nightly.
Comment 58•8 years ago
|
||
I still see crashes matching this signature in crash-stats on recent builds, and we have another user in bug 1241261 who can reproduce this pretty easily. So we have other things we can try or more logging that would be useful?
Jim and I believe this bug is actually due to running out of address space (3GB limit). So not quite a typical OOM, but more like we have too many files mapped.
One thing that contributes greatly to the problem is that the Adreno driver maps a bunch of small-ish regions (~100k each), but there are gaps between them that are basically unusable. Take the following subset of Adreno driver maps from a random ElfLoader crash: ba0f0000-ba201000 rw-s 00117000 00:0b 8241 /dev/kgsl-3d0 ba690000-ba7a1000 rw-s 000f7000 00:0b 8241 /dev/kgsl-3d0 baa80000-bac63000 rw-s 00074000 00:0b 8241 /dev/kgsl-3d0 bac70000-bac91000 rw-s 00094000 00:0b 8241 /dev/kgsl-3d0 bb067000-bb069000 rw-s 000d8000 00:0b 8241 /dev/kgsl-3d0 bb0d0000-bb1e1000 rw-s 00124000 00:0b 8241 /dev/kgsl-3d0 bb1f0000-bb301000 rw-s 00125000 00:0b 8241 /dev/kgsl-3d0 bb430000-bb541000 rw-s 00127000 00:0b 8241 /dev/kgsl-3d0 bb7c0000-bb926000 rw-s 00105000 00:0b 8241 /dev/kgsl-3d0 bb930000-bba96000 rw-s 00106000 00:0b 8241 /dev/kgsl-3d0 bbbbf000-bbbe0000 rw-s 00031000 00:0b 8241 /dev/kgsl-3d0 bbbe0000-bbc01000 rw-s 0009e000 00:0b 8241 /dev/kgsl-3d0 bbc0a000-bbc2b000 rw-s 000e9000 00:0b 8241 /dev/kgsl-3d0 bbc30000-bbc41000 rw-s 000ed000 00:0b 8241 /dev/kgsl-3d0 bbc50000-bbc87000 rw-s 00091000 00:0b 8241 /dev/kgsl-3d0 bbd6e000-bbd8f000 rw-s 000ae000 00:0b 8241 /dev/kgsl-3d0 bbd90000-bbdb1000 rw-s 000b9000 00:0b 8241 /dev/kgsl-3d0 bc4bf000-bc4c1000 rw-s 0009c000 00:0b 8241 /dev/kgsl-3d0 bc4c6000-bc4c8000 rw-s 0009f000 00:0b 8241 /dev/kgsl-3d0 bc4c8000-bc4ca000 rw-s 000a0000 00:0b 8241 /dev/kgsl-3d0 bc9ca000-bc9db000 rw-s 000aa000 00:0b 8241 /dev/kgsl-3d0 bc9db000-bc9ec000 rw-s 000ab000 00:0b 8241 /dev/kgsl-3d0 bd089000-bd08b000 rw-s 000d2000 00:0b 8241 /dev/kgsl-3d0 The entire range, start to end, is 49917952 bytes (about 48MB). The sum of all the gaps between each mapping is ~37MB. It should be possible to use space in the gaps, but I'm not sure if jemalloc will use it or not depending on the chunk size.
We do not have this crash on the Nexus 6P, which uses an Adreno 430, and I noticed there that there are usually no gaps between the kgsl mappings. Even when there are gaps, they seem to be much smaller than we see with the 330.
Assignee | ||
Comment 62•8 years ago
|
||
This is fascinating. It looks to me like the sizes vary a bit, but there are lots of ~1.06MB ones, which could be a 512x512 texture + metadata.
I can artifically recreate this crash by mmaping a bunch of crap on startup, and then just browsing to wikipedia crashes almost immediately. I get the following very sensible stack when attached with gdb: #0 ElfLoader::~ElfLoader (this=0x77d7c1dc <ElfLoader::Singleton>, __in_chrg=<optimized out>) at /Users/snorp/source/gecko/mozglue/linker/ElfLoader.cpp:521 #1 0x4013bcae in __cxa_finalize () from /Users/snorp/.mozbuild/android-device/jimdb-arm/lib/4d2254620744d908/system/lib/libc.so #2 0x4013bea8 in exit () from /Users/snorp/.mozbuild/android-device/jimdb-arm/lib/4d2254620744d908/system/lib/libc.so #3 0x737e02f0 in ioctl_kgsl_sharedmem_alloc () from /Users/snorp/.mozbuild/android-device/jimdb-arm/lib/4d2254620744d908/system/vendor/lib/libgsl.so #4 0x737d590a in gsl_memory_alloc_pure () from /Users/snorp/.mozbuild/android-device/jimdb-arm/lib/4d2254620744d908/system/vendor/lib/libgsl.so #5 0x76f067d0 in rb_alloc_gfx_mem_pure () from /Users/snorp/.mozbuild/android-device/jimdb-arm/lib/4d2254620744d908/system/vendor/lib/libGLESv2_adreno.so #6 0x76f0690e in rb_mempool_dynamic_alloc_pure () from /Users/snorp/.mozbuild/android-device/jimdb-arm/lib/4d2254620744d908/system/vendor/lib/libGLESv2_adreno.so #7 0x76f0fc64 in rb_texture_update_hw_subimage () from /Users/snorp/.mozbuild/android-device/jimdb-arm/lib/4d2254620744d908/system/vendor/lib/libGLESv2_adreno.so #8 0x76f10c66 in rb_texture_loadsubimage () from /Users/snorp/.mozbuild/android-device/jimdb-arm/lib/4d2254620744d908/system/vendor/lib/libGLESv2_adreno.so #9 0x76eede14 in TexSubImageLoad () from /Users/snorp/.mozbuild/android-device/jimdb-arm/lib/4d2254620744d908/system/vendor/lib/libGLESv2_adreno.so #10 0x76eee030 in core_glTexSubImage2D () from /Users/snorp/.mozbuild/android-device/jimdb-arm/lib/4d2254620744d908/system/vendor/lib/libGLESv2_adreno.so #11 0x76ecd590 in glTexSubImage2D () from /Users/snorp/.mozbuild/android-device/jimdb-arm/lib/4d2254620744d908/system/vendor/lib/libGLESv2_adreno.so ---Type <return> to continue, or q <return> to quit--- #12 0x7d182846 in mozilla::gl::GLContext::fTexSubImage2D (this=this@entry=0xb2317800, target=target@entry=3553, level=level@entry=0, xoffset=xoffset@entry=0, yoffset=yoffset@entry=0, width=width@entry=512, height=height@entry=512, format=format@entry=6408, type=type@entry=5121, pixels=pixels@entry=0xb9300000) at /Users/snorp/source/gecko/gfx/gl/GLContext.h:1637 #13 0x7d176574 in mozilla::gl::TexSubImage2DHelper (gl=gl@entry=0xb2317800, target=target@entry=3553, xoffset=xoffset@entry=0, yoffset=yoffset@entry=0, width=width@entry=512, height=height@entry=512, stride=stride@entry=2048, pixelsize=pixelsize@entry=4, format=format@entry=6408, type=type@entry=5121, pixels=pixels@entry=0xb9300000, level=0) at /Users/snorp/source/gecko/gfx/gl/GLUploadHelpers.cpp:243 #14 0x7d180210 in mozilla::gl::UploadImageDataToTexture (gl=gl@entry=0xb2317800, aData=0xb9300000 "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"..., aStride=aStride@entry=2048, aFormat=<optimized out>, aDstRegion=..., aTexture=@0xb69ff214: 22, aOutUploadSize=aOutUploadSize@entry=0x8084d50c, aNeedInit=aNeedInit@entry=false, aPixelBuffer=aPixelBuffer@entry=false, aTextureUnit=aTextureUnit@entry=33984, aTextureTarget=aTextureTarget@entry=3553) at /Users/snorp/source/gecko/gfx/gl/GLUploadHelpers.cpp:583 #15 0x7d1805e0 in mozilla::gl::UploadSurfaceToTexture (gl=0xb2317800, aSurface=aSurface@entry=0xbdb5fdf0, aDstRegion=..., aTexture=@0xb69ff214: 22, aOutUploadSize=aOutUploadSize@entry=0x8084d50c, aNeedInit=false, aSrcPoint=..., aPixelBuffer=aPixelBuffer@entry=false, aTextureUnit=aTextureUnit@entry=33984, ---Type <return> to continue, or q <return> to quit--- aTextureTarget=aTextureTarget@entry=3553) at /Users/snorp/source/gecko/gfx/gl/GLUploadHelpers.cpp:628 #16 0x7d180716 in mozilla::gl::BasicTextureImage::DirectUpdate (this=0xb69ff1f0, aSurf=0xbdb5fdf0, aRegion=..., aFrom=...) at /Users/snorp/source/gecko/gfx/gl/GLTextureImage.cpp:243 #17 0x7d17ee12 in mozilla::gl::TextureImage::UpdateFromDataSource (this=0xb69ff1f0, aSurface=0xbdb5fdf0, aDestRegion=0x0, aSrcPoint=<optimized out>) at /Users/snorp/source/gecko/gfx/gl/GLTextureImage.cpp:88 #18 0x7d1f3652 in mozilla::layers::TextureImageTextureSourceOGL::Update (this=0xb3790040, aSurface=0xbdb5fdf0, aDestRegion=0x0, aSrcOffset=0x0) at /Users/snorp/source/gecko/gfx/layers/opengl/TextureHostOGL.cpp:195 #19 0x7d1d8bb8 in mozilla::layers::BufferTextureHost::Upload (this=this@entry=0xb69d5110, aRegion=<optimized out>) at /Users/snorp/source/gecko/gfx/layers/composite/TextureHost.cpp:741 #20 0x7d1d8c3c in mozilla::layers::BufferTextureHost::MaybeUpload (this=0xb69d5110, aRegion=<optimized out>) at /Users/snorp/source/gecko/gfx/layers/composite/TextureHost.cpp:618 #21 0x7d1d2b64 in mozilla::layers::UseTileTexture (aTexture=..., aTextureSource=..., aUpdateRect=..., aCompositor=<optimized out>) at /Users/snorp/source/gecko/gfx/layers/composite/TiledContentHost.cpp:162 #22 0x7d1d7fe0 in mozilla::layers::TiledLayerBufferComposite::UseTiles (this=0x80a8a9b0, aTiles=..., aCompositor= 0xb07057a0, aAllocator=<optimized out>) at /Users/snorp/source/gecko/gfx/layers/composite/TiledContentHost.cpp:382 #23 0x7d1d9b72 in mozilla::layers::CompositableParentManager::ReceiveCompositableUpdate (this=0xb06fc170, aEdit=..., replyv=...) at /Users/snorp/source/gecko/gfx/layers/ipc/CompositableTransactionParent.cpp:121 #24 0x7d1eb23c in mozilla::layers::LayerTransactionParent::RecvUpdate(nsTArray<mozilla::layers::Edit>&&, nsTArray<mozilla::layers::OpDestroy>&&, unsigned long long const&, mozilla::layers::TargetConfig const&, nsTArray<mozilla::layers::PluginWindowData>&&, bool const&, bool const&, unsigned int const&, bool const&, mozilla::TimeStamp const&, int const&, nsTArray<mozilla::layers::EditReply>*) (this=0xb06fc100, cset=<optimized out>, aToDestroy=<optimized out>, aTransactionId=<optimized out>, targetConfig=..., aPlugins=aPlugins@entry=<unknown type in /Users/snorp/objdirs/objdir-android-opt/dist/bin/libxul.so, CU 0x5a48ddd, DI---Type <return> to continue, or q <return> to quit--- E 0x5bb5e67>, isFirstPaint=@0x8084d9f9: false, scheduleComposite=@0x8084d9fa: true, paintSequenceNumber=@0x8084da10: 21, isRepeatTransaction=@0x8084d9fb: false, aTransactionStart=..., aPaintSyncId=@0x8084da14: 2, reply=reply@entry=0x0) at /Users/snorp/source/gecko/gfx/layers/ipc/LayerTransactionParent.cpp:584 #25 0x7d1eb62c in mozilla::layers::LayerTransactionParent::RecvUpdateNoSwap(nsTArray<mozilla::layers::Edit>&&, nsTArray<mozilla::layers::OpDestroy>&&, unsigned long long const&, mozilla::layers::TargetConfig const&, nsTArray<mozilla::layers::PluginWindowData>&&, bool const&, bool const&, unsigned int const&, bool const&, mozilla::TimeStamp const&, int const&) (this=<optimized out>, cset=<optimized out>, aToDestroy=<optimized out>, aTransactionId=<optimized out>, targetConfig=..., aPlugins=<unknown type in /Users/snorp/objdirs/objdir-android-opt/dist/bin/libxul.so, CU 0x5a48ddd, DIE 0x5bb60a1>, isFirstPaint=@0x8084d9f9: false, scheduleComposite=@0x8084d9fa: true, paintSequenceNumber=@0x8084da10: 21, isRepeatTransaction=@0x8084d9fb: false, aTransactionStart=..., aPaintSyncId=@0x8084da14: 2) at /Users/snorp/source/gecko/gfx/layers/ipc/LayerTransactionParent.cpp:207
OK, might be barking up the wrong tree here. I'm seeing more than 2000 mappings of libc.so, taking over 600MB of address space. That's probably wrong.
Sigh. Nevermind. Those mappings were ones I created myself to try to recreate this easier :/
Comment 66•8 years ago
|
||
We suspect running WebView instances in the same process as Gecko could be one cause of excessive GPU memory mappings.
I'm pretty convinced at this point that we're hitting address space exhaustion. All of the raw dumps I've examined have mappings right up to the 3GB limit. On many, the first mapping (for /system/bin/app_process) is around 0x40000000, and any automatically-assigned mappings (nullptr for first arg to mmap) do not ever get assigned below that. That means we're left with ~2GB of addressable space, and with all of the junk we inherit from the zygote, we can apparently exhaust that. I am able to map stuff below 0x40000000, but I have to pass the address myself. If I just pass 0x1000 repeatedly, the kernel doesn't put it at the next available space, but behaves as if you pass nullptr. Maybe we could consider modifying jemalloc to use this lower area and free up some address space? I think those would be the easiest mappings to force lower...
It seems someone else has struggled with this problem and wrote a hack to allocate pages below the 0x40000000 start address. http://bitwagon.com/tub.html
Comment 69•8 years ago
|
||
Comment on attachment 8740833 [details] [diff] [review] Run search activity in a separate process (v1) Review of attachment 8740833 [details] [diff] [review]: ----------------------------------------------------------------- Worth a shot.
Updated•8 years ago
|
Comment 71•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/4a9ce80d7a1b
Comment 72•8 years ago
|
||
There have been three occurrences of this crash signature in the Nightly 20160501030217 build: https://crash-stats.mozilla.com/report/index/5f3419ba-cd2d-466b-84dd-f15942160503 https://crash-stats.mozilla.com/report/index/3dc2003c-ad74-4be5-8a6d-b546c2160503 https://crash-stats.mozilla.com/report/index/912fdbac-e524-4fa0-9fda-ebef62160503
Updated•8 years ago
|
I don't think there is anything really actionable on this bug at this point, but it would be nice to find a fix.
Comment 75•8 years ago
|
||
Someone sent me this: https://twitter.com/zioscozio/status/746297797511643136 on June 24: "today's @guardian homepage reliably triggers bug 1164027 on a samsung tablet"
Comment 76•8 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #75) > Someone sent me this: > https://twitter.com/zioscozio/status/746297797511643136 on June 24: > > "today's @guardian homepage reliably triggers bug 1164027 on a samsung > tablet" Probably an OOM crash.
Assignee | ||
Comment 77•8 years ago
|
||
I believe there is a bug in the Adreno 3xx driver on android <= 4.4 which segfaults when glTexImage2D (or similar) fail due to virtual memory exhaustion. I have a proof of concept patch which avoids calling this function when we think it would fail, therefore avoiding the crash. The textures of course do not therefore get allocated, so the user would see e.g. black tiles. But it does avoid the crash. Few improvements still to be made but I should have a patch for review soon.
Assignee | ||
Comment 78•8 years ago
|
||
There is a bug in the graphics driver for Adreno 3xx GPUs on android versions up to 4.4 which segfaults if a texture allocation fails due to virtual memory exhaustion. On affected devices, guess whether an allocation would succeed and avoid calling glTexImage2D where we believe it would fail. As the texture will not be allocated this will likely cause problems such as black rectangles appearing on the screen, but that is preferable to crashing. Review commit: https://reviewboard.mozilla.org/r/65012/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/65012/
Comment on attachment 8772117 [details] Bug 1164027 - Do not call glTexImage2D if likely to fail on Adreno 3xx devices; https://reviewboard.mozilla.org/r/65012/#review62002 I like this, but I would like to track how often this gets called. Can we do a telemetry thing?
Comment 80•8 years ago
|
||
https://reviewboard.mozilla.org/r/65012/#review62044 I'm not a huge fan doing an extra mmap on every texture allocation...
Assignee | ||
Comment 81•8 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #80) > https://reviewboard.mozilla.org/r/65012/#review62044 > > I'm not a huge fan doing an extra mmap on every texture allocation... Yeah it totally sucks. But it's the best way I can think of to avoid this crash. And since this is the top crash on android I think it's necessary. The patch only does the mmap on affected devices.
Assignee | ||
Comment 82•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=695156a4e10af14088530a24a863fa825d2629ad Spoke to Jeff on irc, we can't think of a better way than the mmap. So let's get this checked in. Will think about how best to do telemetry and follow up.
Comment 83•8 years ago
|
||
Pushed by cbook@mozilla.com: https://hg.mozilla.org/integration/fx-team/rev/fff0c4030b81 Do not call glTexImage2D if likely to fail on Adreno 3xx devices; r=jrmuizel,snorp
Comment 84•8 years ago
|
||
+1 for fixing the #2 topcrash.
Comment 85•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/fff0c4030b81
Comment 86•8 years ago
|
||
https://reviewboard.mozilla.org/r/65012/#review63028 ::: gfx/gl/GLContext.cpp:2891 (Diff revision 1) > + // there to be double the actual size of the texture available. > + size_t size = width * height * GetBytesPerTexel(format, type) * 2; > + > + int fd = open("/dev/zero", O_RDONLY); > + > + void *p = mmap(nullptr, size, PROT_NONE, MAP_SHARED, fd, 0); Does this need to be MAP_SHARED or can you use MAP_ANON without needing to open /dev/zero?
Assignee | ||
Comment 87•8 years ago
|
||
Yes that would be better. Tried to add this to mozreview but it got confused so just using an old fashioned patch.
Updated•8 years ago
|
Assignee | ||
Comment 88•8 years ago
|
||
This won't affect any tests but proof it compiles: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bac02e960cf99cc298e2c516a59fa0833c03b1c9
Comment 89•8 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0dbf7d22b063 Use MAP_ANONYMOUS instead of opening /dev/zero. r=jrmuizel
Comment 90•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/0dbf7d22b063
Assignee | ||
Comment 91•8 years ago
|
||
This crash is still occurring on nightly. If anyone can still reproduce this on nightly and has a url that would be very useful. This could be because texture allocation functions other than glTexImage2D are being called, and it's crashing during those. Or, worse, the mmap check isn't robust enough. I will write a patch to perform the check for various other functions as well as just glTexImage2D.
Do you have a link to a more recent crash than the one in comment 0?
Assignee | ||
Comment 93•8 years ago
|
||
Here's one: https://crash-stats.mozilla.com/report/index/352e2058-0167-4087-8268-330492160726 (I just went to the signature, then the reports tab, then sorted by build id)
Comment 94•8 years ago
|
||
Comment on attachment 8772117 [details] Bug 1164027 - Do not call glTexImage2D if likely to fail on Adreno 3xx devices; Flipping this r? to r+ based on comment 82 and the fact that this patch already landed in comment 83.
Comment 95•8 years ago
|
||
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Comment 96•8 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #91) > This crash is still occurring on nightly. If anyone can still reproduce this > on nightly and has a url that would be very useful. I haven't nightly build (not available on Google Play), but I constantly see crash (in released version) on http://www.zpravy.cz/ if I try to zoom in text. Could somebody try it on the nightly build on affected system?
Assignee | ||
Comment 97•8 years ago
|
||
(In reply to Adam Hauner from comment #96) > I haven't nightly build (not available on Google Play), but I constantly see > crash (in released version) on http://www.zpravy.cz/ if I try to zoom in > text. Could somebody try it on the nightly build on affected system? Hi Adam. Zooming in causing too much memory is a bug we are aware of, see bug 1306371. We would expect that to cause this crash on Firefox 49. If you could try the current beta version 50, available from the play store, it would be interesting to know whether you can reproduce this crash using it. Also if you could tell us which phoone/tablet and android version you are running. Thanks!
Comment 98•8 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #97) > Hi Adam. Zooming in causing too much memory is a bug we are aware of, see > bug 1306371. We would expect that to cause this crash on Firefox 49. Crash from Fx 49 on Samsung GT-I9301I, Android 4.4.2, Adreno 305; https://crash-stats.mozilla.com/report/index/a6ea5405-fa18-4ddd-81bd-03d182160930 > If you could try the current beta version 50, available from the play store, I'm not able to reproduce crash on http://www.zpravy.cz/ by zooming in with Fx Beta 50.0b1 (20160920192219) from Play on the same phone.
Assignee | ||
Comment 99•7 years ago
|
||
The fix has now reached stable and the crash rates on 50.0 are significantly lower than on 49. ~150 per day down from two or three thousand \o/ I'm not sure how to reduce it further. There are certainly other places in which the driver could attempt to allocate memory, fail, and therefore crash. I've noticed it happen when rotating the screen when memory is exhausted. I'm sure it could happen in a glDrawArrays/Elements if memory is low. But the fix we have doesn't work so cleanly on those. If somebody who can still reproduce this crash on 50.0 or later gets in touch that would be helpful. But now, assuming crash rates on 50 don't increase much further, we probably have bigger fish to fry.
Comment 100•7 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #99) > The fix has now reached stable and the crash rates on 50.0 are significantly > lower than on 49. ~150 per day down from two or three thousand \o/ Great work, Jamie (and reviewers)! > I'm not sure how to reduce it further. There are certainly other places in > which the driver could attempt to allocate memory, fail, and therefore > crash. I've noticed it happen when rotating the screen when memory is > exhausted. I'm sure it could happen in a glDrawArrays/Elements if memory is > low. But the fix we have doesn't work so cleanly on those. > > If somebody who can still reproduce this crash on 50.0 or later gets in > touch that would be helpful. But now, assuming crash rates on 50 don't > increase much further, we probably have bigger fish to fry. I concur, for what that's worth :)
Comment 101•7 years ago
|
||
[Tracking Requested - why for this release]: FennecAndroid 52.0.2 4225 47.5% 3451 FennecAndroid 47.0 678 7.6% 281 FennecAndroid 48.0 484 5.4% 197 FennecAndroid 50.1.0 419 4.7% 190 FennecAndroid 49.0.2 387 4.3% 156 FennecAndroid 46.0.1 358 4.0% 97 FennecAndroid 51.0.3 317 3.6% 170 FennecAndroid 49.0 266 3.0% 130 FennecAndroid 52.0.1 231 2.6% 139 FennecAndroid 39.0 143 1.6% 20 FennecAndroid 44.0.2 118 1.3% 59 FennecAndroid 52.0 109 1.2% 51
Comment 102•7 years ago
|
||
I am able to reproduce this crash on http://www.zpravy.cz/ by zooming in and out, scrolling, and rotating the display from portrait to landscape with Firefox Beta 53.0b7.
Assignee | ||
Comment 103•7 years ago
|
||
Thanks for the report skywaler333. I think I jumped the gun slightly in comment 99, and these numbers are still too high. I am aware that rotating the display can still cause this. (and I've not been able to reproduce in any other way.) IIRC the egl initialisation must allocate some memory which causes this crash. Perhaps we can release all resources prior to that happening.
Comment 104•7 years ago
|
||
Too late for a fix for 52/53 as we are about to release 53. We could still take a patch for 54 though.
Comment 105•7 years ago
|
||
We can track this for 55, although I only see a few crashes there right now.
Updated•7 years ago
|
Comment 106•7 years ago
|
||
bp-668a4907-f8ab-4c9e-8318-3f2810170430 FennecAndroid 55.0a1 Crash Report [@ ElfLoader::~ElfLoader ] Release Channel nightly Version 55.0a1 Build ID 20170428100509 OS Android OS Version 0.0.0 Linux 3.4.0-1974790 #1 SMP PREEMPT Fri Oct 25 08:41:54 KST 2013 armv7l Android Version 18 (REL) Build Architecture arm Build Architecture Info ARMv7 Qualcomm Krait features: swp,half,thumb,fastmult,vfpv2,edsp,neon,vfpv3,tls,vfpv4,idiva,idivt | 4 Android Manufacturer samsung Android Model SM-N900W8 Android CPU ABI armeabi-v7a Adapter Vendor ID Qualcomm Adapter Device ID Adreno (TM) 330 Startup Crash False MOZ_CRASH Reason MOZ_CRASH(Unexpected shutdown) Crash Reason SIGSEGV Crash Address 0x0 App Notes FP(D00-L1010-W00000000-T010) EGL? EGL+ GL Context? GL Context+ AdapterDescription: 'Model: SM-N900W8, Product: hltevl, Manufacturer: samsung, Hardware: qcom, OpenGL: Qualcomm -- Adreno (TM) 330 -- OpenGL ES 3.0 V@45.0 AU@04.03.00.125.097 RVADDULA_AU_LINUX_ANDROID_JB_3.1.2.04.03.00.125.097+PATCH[ES]_msm8974_JB_3.1.2_CL3905453_release_ENGG (CL@3905453)' WR? WR- GL Layers? GL Layers+ samsung SM-N900W8 samsung/hltevl/hltecan:4.3/JSS15J/N900W8VLUBMJ4:user/release-keys Processor Notes processor_ip-172-31-46-42_1291; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang
Comment 107•7 years ago
|
||
Too late for 54 as we've built 54 RC. Mark 54 won't fix.
Comment 108•7 years ago
|
||
Hi Kats, I am using 54.0b14 and STR from comment 102 lead to a 100% repro on my Nexus 5x Android 7.1.2. I just wanted to poke as the investigation on this one seems stalled and with good STR I hope we can take another look at what needs fixing. The other bit that I wanted to flag was that the crash doesn't trigger a "Crash submission" on my phone. Is that expected? In my case, the tab just disappears with STR. Thanks!
I believe Jamie is still looking into this issue. The lack of crash reporter might be because this is an OOM and Android just kills the app without triggering the relevant signal handlers that bring up the crash reporter. In some cases the OOM might manifest as an allocation failure in the app which can cause a crash, but in other cases the OS might just intervene before that happens.
Assignee | ||
Comment 110•7 years ago
|
||
Hi Ritu. I believe the crash reporter not showing up is due to the system being totally out of memory. See bug 1041968. The page in the STR does indeed cause us to use too much memory, leading to various different crashes. For me at least, it doesn't cause this crash. It might be that you're actually triggering a different crash. I am (slowly) working on a fix in bug 1357359 for that page. This specific crash is caused by a driver bug when memory is low. I have an idea for a fix, but not sure when I'll get round to fixing it - I can't actually reproduce it myself.
Comment 111•7 years ago
|
||
Thanks Kats and Jamie. I might be willing to lend my personal phone at the work week for a live repro/debug session. Just saying! :)
Updated•7 years ago
|
Updated•7 years ago
|
Comment 112•7 years ago
|
||
Marking 56 as affected. Currently #10 top browser crash on Fennec.
Updated•6 years ago
|
Comment 113•6 years ago
|
||
Re-triaging per https://bugzilla.mozilla.org/show_bug.cgi?id=1473195 Needinfo :susheel if you think this bug should be re-triaged.
Comment 114•5 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:jnicol, maybe it's time to close this bug?
Assignee | ||
Comment 115•5 years ago
|
||
Yeah, think we can close this now. We've done everything we realistically can do, and the number of people still on affected devices without updating their OS will continue to decrease.
Updated•5 years ago
|
Comment 116•5 years ago
|
||
We just started publishing x86_64 Fennec Nightly builds in the Google Play Store (bug 1505538). So far we has only one x86_64 crash report and it's has this ElfLoader crash signature. The user is running Android 8.1 (Oreo) on an Acer Aspire S3 laptop.
Assignee | ||
Comment 117•5 years ago
|
||
It doesn't make sense to treat that as the same bug. This bug was a driver bug on some Adreno GPUs causing a crash. That laptop has completely different hardware and the stack is different other than the top function ~ElfLoader. This function deliberately crashes if it detects an unexpected shutdown, so it's expected that multiple different ptoblems result in that function crashing.
We should file a new bug for that instead.
Comment 118•5 years ago
|
||
OK. I filed bug 1529196.
Updated•3 years ago
|
Description
•