Closed Bug 626051 Opened 14 years ago Closed 14 years ago

Fennec crash [@ libz.so@0x14b4 ] [@ libz.so@0x14d4 ] [ @ nsACString_internal::Replace | mozilla::ipc::AsyncChannel::OnDispatchMessage ]

Categories

(Core :: General, defect)

ARM
Android
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla5
Tracking Status
blocking2.0 --- final+
fennec 4.0.1+ ---

People

(Reporter: scoobidiver, Assigned: mwu)

References

Details

(Keywords: crash, relnote, topcrash, Whiteboard: [hardblocker])

Crash Data

Attachments

(1 file, 3 obsolete files)

It is a new crash signature in Fennec 4.0b3. It is #4 top crasher in Fennec 4.0b3 for the last week. Signature libz.so@0x14b4 UUID b3a334be-117d-4a2f-aa64-f01ca2110114 Time 2011-01-14 23:05:24.64066 Uptime 0 Install Age 131 seconds since version was first installed. Product Fennec Version 4.0b3 Build ID 20101221205132 Branch 1.9 OS Linux OS Version 0.0.0 Linux 2.6.32.28-cyanogenmod #1 PREEMPT Wed Jan 12 15:08:49 CST 2011 armv7l CPU arm Crash Reason SIGSEGV Crash Address 0xc6e0df26 Frame Module Signature [Expand] Source 0 libz.so libz.so@0x14b4 1 org.mozilla.firefox-1.apk org.mozilla.firefox-1.apk@0x1adc00 2 org.mozilla.firefox-1.apk org.mozilla.firefox-1.apk@0x1ade13 3 plugin-container __gnu_unwind_pr_common unwind-arm.c:1225 4 libmozalloc.so moz_malloc memory/mozalloc/mozalloc.cpp:109 5 @0xbecf0900 6 libz.so libz.so@0x12447 7 libz.so libz.so@0x12447 8 libz.so libz.so@0x132bf 9 libz.so libz.so@0x1246d 10 libxul.so nsACString_internal::Replace xpcom/string/src/nsTSubstring.cpp:488 11 libxul.so nsFrameScriptExecutor::LoadFrameScriptInternal content/base/src/nsFrameMessageManager.cpp:659 12 libxul.so mozilla::dom::TabChild::RecvLoadRemoteScript dom/ipc/TabChild.cpp:749 13 libxul.so mozilla::dom::PBrowserChild::OnMessageReceived PBrowserChild.cpp:1211 14 libxul.so mozilla::dom::PContentChild::OnMessageReceived PContentChild.cpp:949 15 libxul.so mozilla::ipc::AsyncChannel::OnDispatchMessage ipc/glue/AsyncChannel.cpp:262 16 libxul.so mozilla::ipc::RPCChannel::OnMaybeDequeueOne ipc/glue/RPCChannel.cpp:440 17 libxul.so RunnableMethod<mozilla::ipc::RPCChannel, bool , Tuple0>::Run ipc/chromium/src/base/task.h:308 18 libxul.so mozilla::ipc::RPCChannel::DequeueTask::Run RPCChannel.h:475 19 libxul.so MessageLoop::RunTask ipc/chromium/src/base/message_loop.cc:344 20 libxul.so MessageLoop::DeferOrRunPendingTask ipc/chromium/src/base/message_loop.cc:354 21 libxul.so MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:451 22 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:115 23 libxul.so mozilla::ipc::MessagePumpForChildProcess::Run ipc/glue/MessagePump.cpp:230 24 libxul.so MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:220 25 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:512 26 libxul.so nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:198 27 libxul.so XRE_RunAppShell toolkit/xre/nsEmbedFunctions.cpp:631 28 libxul.so mozilla::ipc::MessagePumpForChildProcess::Run ipc/glue/MessagePump.cpp:222 29 libxul.so MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:220 30 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:512 31 libxul.so XRE_InitChildProcess toolkit/xre/nsEmbedFunctions.cpp:510 32 libmozutils.so ChildProcessInit other-licenses/android/APKOpen.cpp:691 33 plugin-container main ipc/app/MozillaRuntimeMainAndroid.cpp:69 34 libc.so libc.so@0x14c74 More reports at: http://crash-stats.mozilla.com/query/query?product=Fennec&range_value=4&range_unit=weeks&query_search=signature&query_type=startswith&query=libz.so@0x14&build_id=&process_type=any&hang_type=any&do_query=1
tracking-fennec: --- → ?
I find it particularly strange how 779 crashes showed up in the past two days for a previously unknown signature.
Actually, the fact that it showed up on both 4.0b3 and 4.0b4pre at the same time leads me to suspect that something external changed, like a system upgrade perhaps?
It is now #1 top crasher in Fennec 4.0b3 (33% of all crashes) so comment 2 is good.
I'm wondering if some set of devices got an update with a new libz that we don't get along with. Unfortunately, I don't see any crash reports with device information so we can't narrow down which devices are affected. I wonder if using our own libz would make this go away.
Among crashes, libz.so debug identifier is mainly: D85FCD31A7FE605564DE920A8F1117460 Sometimes: D85FCD31A7FE605564DE920A8F1167E90
blocking2.0: --- → ?
tracking-fennec: ? → 2.0+
Assignee: nobody → mwu
First crash happened on 1/14/11 at 4:02 in 4.0b3. First crash happened on 1/14/11 at 9:51 in 4.0b4pre.
The stack beyond the top frame is probably completely bogus, let's assign this to general for now.
blocking2.0: ? → final+
Component: IPC → General
QA Contact: ipc → general
Actually, from about frame 11 on down looks totally sane. 2-10 are definitely off in the weeds, though. If someone can find a copy of that libz.so that matches one of these crash reports, we can probably get a more sensible stack out of it (even with just export symbols, there should be enough CFI to get us to the right caller frame).
Whiteboard: [hardblocker]
As far as I can tell, most if not all the devices are using some sort of hacked firmware. All the ones that say 2.6.32.28.S10.4.OC-ga36929b-dirty , for example, is using some cyanogen gingerbread, and a bunch more actually identify themselves as cyanogen in the kernel version string.
As an android only issue, this probably isn't blocking2.0. (but blocking-fennec, certainly)
blocking2.0: final+ → ?
2.0 and fennec are roughly the same thing. Although given what we know, is this just a cyanogen bug found by people who aren't running production devices?
blocking2.0: ? → final+
(In reply to comment #12) > 2.0 and fennec are roughly the same thing. Although given what we know, is this > just a cyanogen bug found by people who aren't running production devices? This is a cyanogen bug found by people running production devices. Cyanogen however, is a non-stock non-production Android build/firmware that features a great number of non-upstream changes. One of them is a change to zlib which crashes us. Apparently, a large number of our nightly users like to run cyanogen. The particular bug that's biting us/them should be addressed "upstream" with the cyanogen devs IMHO, since we do want to take advantage of zlib optimization where it exists.
I can confirm that CM7 (the Gingerbread version) is affected. When the bug hits, it hits a large number of times in a row and creates a crash report for each, which is probably why the number of crashes is so large. I'm easily looking at over 50 reports, six of which have bp- ID numbers, from the last time this struck. Due to CM policy, I can't file an issue in their tracker, but I will go ahead and put a patch up on CM gerrit to revert the change mwu identified and bring it to the attention of the core team.
I've reverted the patch, and contacted the author with a link to the crash report and this bug report. Nobody wants to break apps, thanks for the heads up.
I can now also confirm that the rollback of the zlib optimization in CyanogenMOD has cleared this up for me.
If you ever see any insanity like this in the future on a CM build of Android, feel free to contact me directly. CM7 isn't actually released yet, we only have nightly builds that aren't RC yet. We usually ask that people not file bugs on nightlies because they are almost always feature requests or things in rapid development, but app breakage is a different story- especially native apps like Fennec.
Let's resolve this, then.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
Keywords: relnote
We're now seeing this in builds other than old CM7 nightlies, including a report that it is in the official version of Android 2.3.3. for the new HTC Desire S: https://support.mozilla.com/en-US/questions/803031 This is the #1 topcrasher for Fennec 4.0. Re-opening and nominating for 4.0.1.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Whiteboard: [hardblocker] → [hardblocker][4.0.1?]
Given this looks to be our #1 crasher, +'ing for 4.0.1
tracking-fennec: 2.0+ → 4.0.1+
Bummer, I wish CodeAurora Forum would have responded when I tried to contact them about the ABI break.
Attached patch WIP (obsolete) — Splinter Review
This disables system zlib, and tries to fix the build to work with the in-tree zlib instead. But freetype still fails to build without system zlib, and I haven't yet figured out how to fix that. If someone who understands the build system wants to take this, please do.
Attached patch Optimize script reading (obsolete) — Splinter Review
Well, for some reason, this optimized zlib copy doesn't like it when we decompress in 8kb chunks. Which.. is fine since reading all at once is how we do it everywhere else and it involves less copies and less lines of code.
Attachment #523344 - Attachment is obsolete: true
Attachment #523660 - Flags: review?(Olli.Pettay)
Comment on attachment 523660 [details] [diff] [review] Optimize script reading >+ if (!buffer || >+ NS_FAILED(input->Read(buffer, avail, &read)) || >+ read != avail) { >+ return; I asked biesi about this and currently it works, but it is not promised by the contract. So better to call Read in a loop until it returns 0. Could you update the patch.
Attachment #523660 - Flags: review?(Olli.Pettay)
(In reply to comment #19) > We're now seeing this in builds other than old CM7 nightlies, including a > report that it is in the official version of Android 2.3.3. for the new HTC > Desire S: I can confirm this with my HTC Desire S with either Fennec 4.0 or with Fennec nightlys (last tried with a 20110406xx-build). The device was also reseted but the error shows also on a freshly installed device. One crashreports (others were throttled): http://crash-stats.mozilla.com/report/index/bp-16ce1dd4-723d-48e3-a6d8-c1dcd2110405
Attached patch Optimize script reading, v2 (obsolete) — Splinter Review
Attachment #523660 - Attachment is obsolete: true
Attachment #524242 - Flags: review?(Olli.Pettay)
Comment on attachment 524242 [details] [diff] [review] Optimize script reading, v2 So why can you just use NS_ReadInputStreamToString which I linked to? rv = NS_ReadInputStreamToString(input, data, avail); if (NS_FAILED(rv)) { return; }
why can't you...
Attachment #524242 - Attachment is obsolete: true
Attachment #524250 - Flags: review?(Olli.Pettay)
Attachment #524242 - Flags: review?(Olli.Pettay)
Attachment #524250 - Flags: review?(Olli.Pettay) → review+
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
This should also be landed on mozilla-2.1.
Whiteboard: [hardblocker][4.0.1?] → [hardblocker][needs to land on mozilla-2.1]
Keywords: checkin-needed
Keywords: checkin-needed
Whiteboard: [hardblocker][needs to land on mozilla-2.1] → [hardblocker]
Target Milestone: --- → mozilla5
Verified Fixed using a Desire S Mozilla/5.0 (Android; Linux armv7l; rv:2.1.1) Gecko/20110415 Firefox/4.0.2pre Fennec/4.0.1 ID:20110415172201
Status: RESOLVED → VERIFIED
v. Mozilla/5.0 (Android; Linux armv7l; rv:6.0a1) Gecko/20110419 Firefox/6.0a1 Fennec/6.0a1 ID:20110419042214
Crash Signature: [@ libz.so@0x14b4 ] [@ libz.so@0x14d4 ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: