Closed Bug 626051 Opened 9 years ago Closed 9 years ago

Fennec crash [@ ] [@ ] [ @ nsACString_internal::Replace | mozilla::ipc::AsyncChannel::OnDispatchMessage ]


(Core :: General, defect, critical)

Not set



Tracking Status
blocking2.0 --- final+
fennec 4.0.1+ ---


(Reporter: scoobidiver, Assigned: mwu)



(Keywords: crash, relnote, topcrash, Whiteboard: [hardblocker])

Crash Data


(1 file, 3 obsolete files)

It is a new crash signature in Fennec 4.0b3.
It is #4 top crasher in Fennec 4.0b3 for the last week.

UUID	b3a334be-117d-4a2f-aa64-f01ca2110114
Time 	2011-01-14 23:05:24.64066
Uptime	0
Install Age	131 seconds since version was first installed.
Product	Fennec
Version	4.0b3
Build ID	20101221205132
Branch	1.9
OS	Linux
OS Version	0.0.0 Linux #1 PREEMPT Wed Jan 12 15:08:49 CST 2011 armv7l
CPU	arm
Crash Reason	SIGSEGV
Crash Address	0xc6e0df26

Frame 	Module 	Signature [Expand] 	Source
1 	org.mozilla.firefox-1.apk 	org.mozilla.firefox-1.apk@0x1adc00 	
2 	org.mozilla.firefox-1.apk 	org.mozilla.firefox-1.apk@0x1ade13 	
3 	plugin-container 	__gnu_unwind_pr_common 	unwind-arm.c:1225
4 	moz_malloc 	memory/mozalloc/mozalloc.cpp:109
5 		@0xbecf0900 	
10 	nsACString_internal::Replace 	xpcom/string/src/nsTSubstring.cpp:488
11 	nsFrameScriptExecutor::LoadFrameScriptInternal 	content/base/src/nsFrameMessageManager.cpp:659
12 	mozilla::dom::TabChild::RecvLoadRemoteScript 	dom/ipc/TabChild.cpp:749
13 	mozilla::dom::PBrowserChild::OnMessageReceived 	PBrowserChild.cpp:1211
14 	mozilla::dom::PContentChild::OnMessageReceived 	PContentChild.cpp:949
15 	mozilla::ipc::AsyncChannel::OnDispatchMessage 	ipc/glue/AsyncChannel.cpp:262
16 	mozilla::ipc::RPCChannel::OnMaybeDequeueOne 	ipc/glue/RPCChannel.cpp:440
17 	RunnableMethod<mozilla::ipc::RPCChannel, bool , Tuple0>::Run 	ipc/chromium/src/base/task.h:308
18 	mozilla::ipc::RPCChannel::DequeueTask::Run 	RPCChannel.h:475
19 	MessageLoop::RunTask 	ipc/chromium/src/base/
20 	MessageLoop::DeferOrRunPendingTask 	ipc/chromium/src/base/
21 	MessageLoop::DoWork 	ipc/chromium/src/base/
22 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:115
23 	mozilla::ipc::MessagePumpForChildProcess::Run 	ipc/glue/MessagePump.cpp:230
24 	MessageLoop::RunInternal 	ipc/chromium/src/base/
25 	MessageLoop::Run 	ipc/chromium/src/base/
26 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:198
27 	XRE_RunAppShell 	toolkit/xre/nsEmbedFunctions.cpp:631
28 	mozilla::ipc::MessagePumpForChildProcess::Run 	ipc/glue/MessagePump.cpp:222
29 	MessageLoop::RunInternal 	ipc/chromium/src/base/
30 	MessageLoop::Run 	ipc/chromium/src/base/
31 	XRE_InitChildProcess 	toolkit/xre/nsEmbedFunctions.cpp:510
32 	ChildProcessInit 	other-licenses/android/APKOpen.cpp:691
33 	plugin-container 	main 	ipc/app/MozillaRuntimeMainAndroid.cpp:69

More reports at:
tracking-fennec: --- → ?
I find it particularly strange how 779 crashes showed up in the past two days for a previously unknown signature.
Actually, the fact that it showed up on both 4.0b3 and 4.0b4pre at the same time leads me to suspect that something external changed, like a system upgrade perhaps?
It is now #1 top crasher in Fennec 4.0b3 (33% of all crashes) so comment 2 is good.
I'm wondering if some set of devices got an update with a new libz that we don't get along with. Unfortunately, I don't see any crash reports with device information so we can't narrow down which devices are affected. 

I wonder if using our own libz would make this go away.
Among crashes, debug identifier is mainly:
Sometimes: D85FCD31A7FE605564DE920A8F1167E90
blocking2.0: --- → ?
tracking-fennec: ? → 2.0+
Assignee: nobody → mwu
First crash happened on 1/14/11 at 4:02 in 4.0b3.
First crash happened on 1/14/11 at 9:51 in 4.0b4pre.
The stack beyond the top frame is probably completely bogus, let's assign this to general for now.
blocking2.0: ? → final+
Component: IPC → General
QA Contact: ipc → general
Actually, from about frame 11 on down looks totally sane. 2-10 are definitely off in the weeds, though. If someone can find a copy of that that matches one of these crash reports, we can probably get a more sensible stack out of it (even with just export symbols, there should be enough CFI to get us to the right caller frame).
Whiteboard: [hardblocker]
As far as I can tell, most if not all the devices are using some sort of hacked firmware. All the ones that say , for example, is using some cyanogen gingerbread, and a bunch more actually identify themselves as cyanogen in the kernel version string.
As an android only issue, this probably isn't blocking2.0. (but blocking-fennec, certainly)
blocking2.0: final+ → ?
2.0 and fennec are roughly the same thing. Although given what we know, is this just a cyanogen bug found by people who aren't running production devices?
blocking2.0: ? → final+
(In reply to comment #12)
> 2.0 and fennec are roughly the same thing. Although given what we know, is this
> just a cyanogen bug found by people who aren't running production devices?

This is a cyanogen bug found by people running production devices. Cyanogen however, is a non-stock non-production Android build/firmware that features a great number of non-upstream changes. One of them is a change to zlib which crashes us. Apparently, a large number of our nightly users like to run cyanogen.

The particular bug that's biting us/them should be addressed "upstream" with the cyanogen devs IMHO, since we do want to take advantage of zlib optimization where it exists.
I can confirm that CM7 (the Gingerbread version) is affected. When the bug hits, it hits a large number of times in a row and creates a crash report for each, which is probably why the number of crashes is so large. I'm easily looking at over 50 reports, six of which have bp- ID numbers, from the last time this struck.

Due to CM policy, I can't file an issue in their tracker, but I will go ahead and put a patch up on CM gerrit to revert the change mwu identified and bring it to the attention of the core team.
I've reverted the patch, and contacted the author with a link to the crash report and this bug report. Nobody wants to break apps, thanks for the heads up.
I can now also confirm that the rollback of the zlib optimization in CyanogenMOD has cleared this up for me.
If you ever see any insanity like this in the future on a CM build of Android, feel free to contact me directly. CM7 isn't actually released yet, we only have nightly builds that aren't RC yet. We usually ask that people not file bugs on nightlies because they are almost always feature requests or things in rapid development, but app breakage is a different story- especially native apps like Fennec.
Let's resolve this, then.
Closed: 9 years ago
Resolution: --- → WORKSFORME
Keywords: relnote
We're now seeing this in builds other than old CM7 nightlies, including a report that it is in the official version of Android 2.3.3. for the new HTC Desire S:

This is the #1 topcrasher for Fennec 4.0.  Re-opening and nominating for 4.0.1.
Resolution: WORKSFORME → ---
Whiteboard: [hardblocker] → [hardblocker][4.0.1?]
Given this looks to be our #1 crasher, +'ing for 4.0.1
tracking-fennec: 2.0+ → 4.0.1+
Bummer, I wish CodeAurora Forum would have responded when I tried to contact them about the ABI break.
Attached patch WIP (obsolete) — Splinter Review
This disables system zlib, and tries to fix the build to work with the in-tree zlib instead.  But freetype still fails to build without system zlib, and I haven't yet figured out how to fix that.  If someone who understands the build system wants to take this, please do.
Attached patch Optimize script reading (obsolete) — Splinter Review
Well, for some reason, this optimized zlib copy doesn't like it when we decompress in 8kb chunks. Which.. is fine since reading all at once is how we do it everywhere else and it involves less copies and less lines of code.
Attachment #523344 - Attachment is obsolete: true
Attachment #523660 - Flags: review?(Olli.Pettay)
Comment on attachment 523660 [details] [diff] [review]
Optimize script reading

>+    if (!buffer ||
>+        NS_FAILED(input->Read(buffer, avail, &read)) ||
>+        read != avail) {
>+      return;
I asked biesi about this and currently it works, but
it is not promised by the contract.
So better to call Read in a loop until it returns 0.

Could you update the patch.
Attachment #523660 - Flags: review?(Olli.Pettay)
(In reply to comment #19)
> We're now seeing this in builds other than old CM7 nightlies, including a
> report that it is in the official version of Android 2.3.3. for the new HTC
> Desire S:

I can confirm this with my HTC Desire S with either Fennec 4.0 or with Fennec nightlys (last tried with a 20110406xx-build). The device was also reseted but the error shows also on a freshly installed device.

One crashreports (others were throttled):
Attached patch Optimize script reading, v2 (obsolete) — Splinter Review
Attachment #523660 - Attachment is obsolete: true
Attachment #524242 - Flags: review?(Olli.Pettay)
Comment on attachment 524242 [details] [diff] [review]
Optimize script reading, v2

So why can you just use NS_ReadInputStreamToString which I linked to?
  rv = NS_ReadInputStreamToString(input, data, avail);
  if (NS_FAILED(rv)) {
why can't you...
Attachment #524242 - Attachment is obsolete: true
Attachment #524250 - Flags: review?(Olli.Pettay)
Attachment #524242 - Flags: review?(Olli.Pettay)
Attachment #524250 - Flags: review?(Olli.Pettay) → review+
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
This should also be landed on mozilla-2.1.
Whiteboard: [hardblocker][4.0.1?] → [hardblocker][needs to land on mozilla-2.1]
Keywords: checkin-needed
Keywords: checkin-needed
Whiteboard: [hardblocker][needs to land on mozilla-2.1] → [hardblocker]
Duplicate of this bug: 649588
Duplicate of this bug: 650803
Target Milestone: --- → mozilla5
Verified Fixed using a Desire S Mozilla/5.0 (Android; Linux armv7l; rv:2.1.1) Gecko/20110415 Firefox/4.0.2pre Fennec/4.0.1 ID:20110415172201
v. Mozilla/5.0 (Android; Linux armv7l; rv:6.0a1) Gecko/20110419 Firefox/6.0a1 Fennec/6.0a1 ID:20110419042214
Crash Signature: [@ ] [@ ]
You need to log in before you can comment on or make changes to this bug.