Closed Bug 626051 Opened 14 years ago Closed 13 years ago

Fennec crash [@ libz.so@0x14b4 ] [@ libz.so@0x14d4 ] [ @ nsACString_internal::Replace | mozilla::ipc::AsyncChannel::OnDispatchMessage ]

Categories

(Core :: General, defect)

ARM
Android
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla5
Tracking Status
blocking2.0 --- final+
fennec 4.0.1+ ---

People

(Reporter: scoobidiver, Assigned: mwu)

References

Details

(Keywords: crash, relnote, topcrash, Whiteboard: [hardblocker])

Crash Data

Attachments

(1 file, 3 obsolete files)

It is a new crash signature in Fennec 4.0b3.
It is #4 top crasher in Fennec 4.0b3 for the last week.

Signature	libz.so@0x14b4
UUID	b3a334be-117d-4a2f-aa64-f01ca2110114
Time 	2011-01-14 23:05:24.64066
Uptime	0
Install Age	131 seconds since version was first installed.
Product	Fennec
Version	4.0b3
Build ID	20101221205132
Branch	1.9
OS	Linux
OS Version	0.0.0 Linux 2.6.32.28-cyanogenmod #1 PREEMPT Wed Jan 12 15:08:49 CST 2011 armv7l
CPU	arm
Crash Reason	SIGSEGV
Crash Address	0xc6e0df26

Frame 	Module 	Signature [Expand] 	Source
0 	libz.so 	libz.so@0x14b4 	
1 	org.mozilla.firefox-1.apk 	org.mozilla.firefox-1.apk@0x1adc00 	
2 	org.mozilla.firefox-1.apk 	org.mozilla.firefox-1.apk@0x1ade13 	
3 	plugin-container 	__gnu_unwind_pr_common 	unwind-arm.c:1225
4 	libmozalloc.so 	moz_malloc 	memory/mozalloc/mozalloc.cpp:109
5 		@0xbecf0900 	
6 	libz.so 	libz.so@0x12447 	
7 	libz.so 	libz.so@0x12447 	
8 	libz.so 	libz.so@0x132bf 	
9 	libz.so 	libz.so@0x1246d 	
10 	libxul.so 	nsACString_internal::Replace 	xpcom/string/src/nsTSubstring.cpp:488
11 	libxul.so 	nsFrameScriptExecutor::LoadFrameScriptInternal 	content/base/src/nsFrameMessageManager.cpp:659
12 	libxul.so 	mozilla::dom::TabChild::RecvLoadRemoteScript 	dom/ipc/TabChild.cpp:749
13 	libxul.so 	mozilla::dom::PBrowserChild::OnMessageReceived 	PBrowserChild.cpp:1211
14 	libxul.so 	mozilla::dom::PContentChild::OnMessageReceived 	PContentChild.cpp:949
15 	libxul.so 	mozilla::ipc::AsyncChannel::OnDispatchMessage 	ipc/glue/AsyncChannel.cpp:262
16 	libxul.so 	mozilla::ipc::RPCChannel::OnMaybeDequeueOne 	ipc/glue/RPCChannel.cpp:440
17 	libxul.so 	RunnableMethod<mozilla::ipc::RPCChannel, bool , Tuple0>::Run 	ipc/chromium/src/base/task.h:308
18 	libxul.so 	mozilla::ipc::RPCChannel::DequeueTask::Run 	RPCChannel.h:475
19 	libxul.so 	MessageLoop::RunTask 	ipc/chromium/src/base/message_loop.cc:344
20 	libxul.so 	MessageLoop::DeferOrRunPendingTask 	ipc/chromium/src/base/message_loop.cc:354
21 	libxul.so 	MessageLoop::DoWork 	ipc/chromium/src/base/message_loop.cc:451
22 	libxul.so 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:115
23 	libxul.so 	mozilla::ipc::MessagePumpForChildProcess::Run 	ipc/glue/MessagePump.cpp:230
24 	libxul.so 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:220
25 	libxul.so 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:512
26 	libxul.so 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:198
27 	libxul.so 	XRE_RunAppShell 	toolkit/xre/nsEmbedFunctions.cpp:631
28 	libxul.so 	mozilla::ipc::MessagePumpForChildProcess::Run 	ipc/glue/MessagePump.cpp:222
29 	libxul.so 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:220
30 	libxul.so 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:512
31 	libxul.so 	XRE_InitChildProcess 	toolkit/xre/nsEmbedFunctions.cpp:510
32 	libmozutils.so 	ChildProcessInit 	other-licenses/android/APKOpen.cpp:691
33 	plugin-container 	main 	ipc/app/MozillaRuntimeMainAndroid.cpp:69
34 	libc.so 	libc.so@0x14c74 	

More reports at:
http://crash-stats.mozilla.com/query/query?product=Fennec&range_value=4&range_unit=weeks&query_search=signature&query_type=startswith&query=libz.so@0x14&build_id=&process_type=any&hang_type=any&do_query=1
tracking-fennec: --- → ?
I find it particularly strange how 779 crashes showed up in the past two days for a previously unknown signature.
Actually, the fact that it showed up on both 4.0b3 and 4.0b4pre at the same time leads me to suspect that something external changed, like a system upgrade perhaps?
It is now #1 top crasher in Fennec 4.0b3 (33% of all crashes) so comment 2 is good.
I'm wondering if some set of devices got an update with a new libz that we don't get along with. Unfortunately, I don't see any crash reports with device information so we can't narrow down which devices are affected. 

I wonder if using our own libz would make this go away.
Among crashes, libz.so debug identifier is mainly:
D85FCD31A7FE605564DE920A8F1117460
Sometimes: D85FCD31A7FE605564DE920A8F1167E90
blocking2.0: --- → ?
tracking-fennec: ? → 2.0+
Assignee: nobody → mwu
First crash happened on 1/14/11 at 4:02 in 4.0b3.
First crash happened on 1/14/11 at 9:51 in 4.0b4pre.
The stack beyond the top frame is probably completely bogus, let's assign this to general for now.
blocking2.0: ? → final+
Component: IPC → General
QA Contact: ipc → general
Actually, from about frame 11 on down looks totally sane. 2-10 are definitely off in the weeds, though. If someone can find a copy of that libz.so that matches one of these crash reports, we can probably get a more sensible stack out of it (even with just export symbols, there should be enough CFI to get us to the right caller frame).
Whiteboard: [hardblocker]
As far as I can tell, most if not all the devices are using some sort of hacked firmware. All the ones that say 2.6.32.28.S10.4.OC-ga36929b-dirty , for example, is using some cyanogen gingerbread, and a bunch more actually identify themselves as cyanogen in the kernel version string.
We're crashing due to a cyanogen specific change in zlib: https://github.com/CyanogenMod/android_external_zlib/commit/e6981afc21ff7b315c945b062763db11ef231ef4
As an android only issue, this probably isn't blocking2.0. (but blocking-fennec, certainly)
blocking2.0: final+ → ?
2.0 and fennec are roughly the same thing. Although given what we know, is this just a cyanogen bug found by people who aren't running production devices?
blocking2.0: ? → final+
(In reply to comment #12)
> 2.0 and fennec are roughly the same thing. Although given what we know, is this
> just a cyanogen bug found by people who aren't running production devices?

This is a cyanogen bug found by people running production devices. Cyanogen however, is a non-stock non-production Android build/firmware that features a great number of non-upstream changes. One of them is a change to zlib which crashes us. Apparently, a large number of our nightly users like to run cyanogen.

The particular bug that's biting us/them should be addressed "upstream" with the cyanogen devs IMHO, since we do want to take advantage of zlib optimization where it exists.
I can confirm that CM7 (the Gingerbread version) is affected. When the bug hits, it hits a large number of times in a row and creates a crash report for each, which is probably why the number of crashes is so large. I'm easily looking at over 50 reports, six of which have bp- ID numbers, from the last time this struck.

Due to CM policy, I can't file an issue in their tracker, but I will go ahead and put a patch up on CM gerrit to revert the change mwu identified and bring it to the attention of the core team.
I've reverted the patch, and contacted the author with a link to the crash report and this bug report. Nobody wants to break apps, thanks for the heads up.
I can now also confirm that the rollback of the zlib optimization in CyanogenMOD has cleared this up for me.
If you ever see any insanity like this in the future on a CM build of Android, feel free to contact me directly. CM7 isn't actually released yet, we only have nightly builds that aren't RC yet. We usually ask that people not file bugs on nightlies because they are almost always feature requests or things in rapid development, but app breakage is a different story- especially native apps like Fennec.
Let's resolve this, then.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
Keywords: relnote
We're now seeing this in builds other than old CM7 nightlies, including a report that it is in the official version of Android 2.3.3. for the new HTC Desire S:

https://support.mozilla.com/en-US/questions/803031

This is the #1 topcrasher for Fennec 4.0.  Re-opening and nominating for 4.0.1.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Whiteboard: [hardblocker] → [hardblocker][4.0.1?]
Given this looks to be our #1 crasher, +'ing for 4.0.1
tracking-fennec: 2.0+ → 4.0.1+
Bummer, I wish CodeAurora Forum would have responded when I tried to contact them about the ABI break.
Attached patch WIP (obsolete) — Splinter Review
This disables system zlib, and tries to fix the build to work with the in-tree zlib instead.  But freetype still fails to build without system zlib, and I haven't yet figured out how to fix that.  If someone who understands the build system wants to take this, please do.
Attached patch Optimize script reading (obsolete) — Splinter Review
Well, for some reason, this optimized zlib copy doesn't like it when we decompress in 8kb chunks. Which.. is fine since reading all at once is how we do it everywhere else and it involves less copies and less lines of code.
Attachment #523344 - Attachment is obsolete: true
Attachment #523660 - Flags: review?(Olli.Pettay)
Comment on attachment 523660 [details] [diff] [review]
Optimize script reading

>+    if (!buffer ||
>+        NS_FAILED(input->Read(buffer, avail, &read)) ||
>+        read != avail) {
>+      return;
I asked biesi about this and currently it works, but
it is not promised by the contract.
So better to call Read in a loop until it returns 0.

Could you update the patch.
Attachment #523660 - Flags: review?(Olli.Pettay)
(In reply to comment #19)
> We're now seeing this in builds other than old CM7 nightlies, including a
> report that it is in the official version of Android 2.3.3. for the new HTC
> Desire S:

I can confirm this with my HTC Desire S with either Fennec 4.0 or with Fennec nightlys (last tried with a 20110406xx-build). The device was also reseted but the error shows also on a freshly installed device.

One crashreports (others were throttled):
http://crash-stats.mozilla.com/report/index/bp-16ce1dd4-723d-48e3-a6d8-c1dcd2110405
Attached patch Optimize script reading, v2 (obsolete) — Splinter Review
Attachment #523660 - Attachment is obsolete: true
Attachment #524242 - Flags: review?(Olli.Pettay)
Comment on attachment 524242 [details] [diff] [review]
Optimize script reading, v2

So why can you just use NS_ReadInputStreamToString which I linked to?
  rv = NS_ReadInputStreamToString(input, data, avail);
  if (NS_FAILED(rv)) {
    return;
  }
why can't you...
Attachment #524242 - Attachment is obsolete: true
Attachment #524250 - Flags: review?(Olli.Pettay)
Attachment #524242 - Flags: review?(Olli.Pettay)
Attachment #524250 - Flags: review?(Olli.Pettay) → review+
http://hg.mozilla.org/mozilla-central/rev/ee12989404ec
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
This should also be landed on mozilla-2.1.
Whiteboard: [hardblocker][4.0.1?] → [hardblocker][needs to land on mozilla-2.1]
Keywords: checkin-needed
http://hg.mozilla.org/releases/mozilla-2.1/rev/d0344a994a44
Keywords: checkin-needed
Whiteboard: [hardblocker][needs to land on mozilla-2.1] → [hardblocker]
Target Milestone: --- → mozilla5
Verified Fixed using a Desire S Mozilla/5.0 (Android; Linux armv7l; rv:2.1.1) Gecko/20110415 Firefox/4.0.2pre Fennec/4.0.1 ID:20110415172201
Status: RESOLVED → VERIFIED
v. Mozilla/5.0 (Android; Linux armv7l; rv:6.0a1) Gecko/20110419 Firefox/6.0a1 Fennec/6.0a1 ID:20110419042214
Crash Signature: [@ libz.so@0x14b4 ] [@ libz.so@0x14d4 ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: