Closed Bug 636555 Opened 13 years ago Closed 13 years ago

Installing Bugzilla Tweaks 1.8 leads to a crash

Categories

(Firefox for Android Graveyard :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(fennec2.0+)

RESOLVED DUPLICATE of bug 637915
Tracking Status
fennec 2.0+ ---

People

(Reporter: jdm, Assigned: dougt)

Details

(Keywords: crash)

Attachments

(1 file, 4 obsolete files)

Install bugzilla tweaks by searching in the addon screen and pressing "add to fennec".  Watch the crash reporter dialog appear.
This is on the 2/24 nightly for me.  about:crashes doesn't seem to show any new entries for some reason.
tracking-fennec: --- → ?
logcat shows this as soon as the log messages from the alertsProgressListener_OnProgress reach 100%:

W/dalvikvm(12390): ReferenceTable overflow (max=1024)
W/dalvikvm(12390): Last 10 entries in JNI pinned array reference table:
W/dalvikvm(12390):  1014: 0x400328f0 cls=[C (20 bytes)
W/dalvikvm(12390):  1015: 0x400328f0 cls=[C (20 bytes)
W/dalvikvm(12390):  1016: 0x400328f0 cls=[C (20 bytes)
W/dalvikvm(12390):  1017: 0x400328f0 cls=[C (20 bytes)
W/dalvikvm(12390):  1018: 0x400328f0 cls=[C (20 bytes)
W/dalvikvm(12390):  1019: 0x400328f0 cls=[C (20 bytes)
W/dalvikvm(12390):  1020: 0x400328f0 cls=[C (20 bytes)
W/dalvikvm(12390):  1021: 0x400328f0 cls=[C (20 bytes)
W/dalvikvm(12390):  1022: 0x400328f0 cls=[C (20 bytes)
W/dalvikvm(12390):  1023: 0x400328f0 cls=[C (20 bytes)
W/dalvikvm(12390): JNI pinned array reference table summary (1024 entries):
W/dalvikvm(12390):  1024 of [C 20B (1 unique)
W/dalvikvm(12390): Memory held directly by native code is 20 bytes
E/dalvikvm(12390): Failed adding to JNI pinned array ref table (1024 entries)
I/dalvikvm(12390): "Thread-11" prio=5 tid=21 RUNNABLE
I/dalvikvm(12390):   | group="main" sCount=0 dsCount=0 s=N obj=0x47c50158 self=0x13dc38
I/dalvikvm(12390):   | sysTid=12399 nice=0 sched=0/0 cgrp=default handle=1301368
I/dalvikvm(12390):   at org.mozilla.gecko.GeckoAppShell.nativeRun(Native Method)
I/dalvikvm(12390):   at org.mozilla.gecko.GeckoAppShell.nativeRun(Native Method)
I/dalvikvm(12390):   at org.mozilla.gecko.GeckoAppShell.runGecko(GeckoAppShell.java:324)
I/dalvikvm(12390):   at org.mozilla.gecko.GeckoApp$2.run(GeckoApp.java:165)
I/dalvikvm(12390): 
E/dalvikvm(12390): VM aborting
D/Zygote  ( 2190): Process 12390 exited cleanly (1)
I/WindowManager( 2254): WIN DEATH: Window{47b75ab8 SurfaceView paused=false}
I/WindowManager( 2254): WIN DEATH: Window{47b4a8d0 org.mozilla.fennec/org.mozilla.fennec.App paused=false}
I/ActivityManager( 2254): Process org.mozilla.fennec (pid 12390) has died.
This stackoverflow question [1] claims that the cause is leaking strings.

[1] http://stackoverflow.com/questions/4959495/jni-reference-table-overflow
CCing the usual android embedding suspects.
on desktop i see:

error: An exception occurred.
Traceback (most recent call last):
  File "resource://jid0-qbniplfdfa4lpdrjhac6vbqn20q-addon-kit-lib/context-menu.js", line 47, in 
    ].join(""));
Error: The context-menu module currently supports only Firefox.  In the future we would like it to support other applications, however.  Please see https://bugzilla.mozilla.org/show_bug.cgi?id=560716 for more information.

Not sure if it is related yet.
wfm on a g2.  build from yesterday.
bah.  i lie.

it doesn't happen every time, but it does happen more than 1/3 of the time.


logcat:

D/szipinf (10666): Initializing inflate state
D/szipinf (10666): Initializing inflate state
D/szipinf (10666): Initializing inflate state
D/szipinf (10666): Initializing inflate state
D/szipinf (10666): Initializing inflate state
D/szipinf (10666): Initializing inflate state
D/szipinf (10666): Initializing inflate state
D/szipinf (10666): Initializing inflate state
W/dalvikvm(10666): ReferenceTable overflow (max=1024)
W/dalvikvm(10666): Last 10 entries in JNI pinned array reference table:
W/dalvikvm(10666):  1014: 0x400200f8 cls=[C (20 bytes)
W/dalvikvm(10666):  1015: 0x400200f8 cls=[C (20 bytes)
W/dalvikvm(10666):  1016: 0x400200f8 cls=[C (20 bytes)
W/dalvikvm(10666):  1017: 0x400200f8 cls=[C (20 bytes)
W/dalvikvm(10666):  1018: 0x400200f8 cls=[C (20 bytes)
W/dalvikvm(10666):  1019: 0x400200f8 cls=[C (20 bytes)
W/dalvikvm(10666):  1020: 0x400200f8 cls=[C (20 bytes)
W/dalvikvm(10666):  1021: 0x400200f8 cls=[C (20 bytes)
W/dalvikvm(10666):  1022: 0x400200f8 cls=[C (20 bytes)
W/dalvikvm(10666):  1023: 0x400200f8 cls=[C (20 bytes)
W/dalvikvm(10666): JNI pinned array reference table summary (1024 entries):
W/dalvikvm(10666):  1024 of [C 20B (1 unique)
W/dalvikvm(10666): Memory held directly by tracked refs is 20 bytes
E/dalvikvm(10666): Failed adding to JNI pinned array ref table (1024 entries)
I/dalvikvm(10666): "Thread-12" prio=5 tid=11 RUNNABLE
I/dalvikvm(10666):   | group="main" sCount=0 dsCount=0 obj=0x4054c4d8 self=0x148e68
I/dalvikvm(10666):   | sysTid=10676 nice=0 sched=0/0 cgrp=default handle=1006664
I/dalvikvm(10666):   | schedstat=( 19562835692 3778076159 10072 )
I/dalvikvm(10666):   at org.mozilla.gecko.GeckoAppShell.nativeRun(Native Method)
I/dalvikvm(10666):   at org.mozilla.gecko.GeckoAppShell.nativeRun(Native Method)
I/dalvikvm(10666):   at org.mozilla.gecko.GeckoAppShell.runGecko(GeckoAppShell.java:324)
I/dalvikvm(10666):   at org.mozilla.gecko.GeckoApp$2.run(GeckoApp.java:165)
I/dalvikvm(10666): 
E/dalvikvm(10666): VM aborting




from gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 10676]
0xaca401f8 in dvmAbort () from /home/dougt/tegra-gdb-20100902/lib/libdvm.so
(gdb) bt
#0  0xaca401f8 in dvmAbort () from /home/dougt/tegra-gdb-20100902/lib/libdvm.so
#1  0xaca44b54 in ?? () from /home/dougt/tegra-gdb-20100902/lib/libdvm.so
#2  0xaca44b54 in ?? () from /home/dougt/tegra-gdb-20100902/lib/libdvm.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

dvmAddToReferenceTable is failing.  

Sounds like a leak in android and we are overflowing a table.  

http://code.google.com/p/android/issues/detail?id=2123
http://code.google.com/p/android/issues/detail?id=8868

maybe we can work around.  What is confusing right now is why this addon installation is tickling this bug, and not everything else we do.
tracking-fennec: ? → 2.0+
reproducible by simply downloading the XPI from a http server (not using the built in UI).
to make matters worse, you can get into a state where restarting after the crash will trigger this abort.  this requires that you kill your profile.
Assignee: nobody → doug.turner
(In reply to comment #9)
> to make matters worse, you can get into a state where restarting after the
> crash will trigger this abort.  this requires that you kill your profile.

This is not actually true, in my experience.  I've found that Android seems to clear out the pinned table after some period of time (a couple minutes, maybe?) and I can get Fennec to start without problems after that.
<devils_advocate>
If this crash is limited to Jetpack SDK based add-ons, which Fennec technically doesn't support, then I don't think we should hard block on it. If you think we should hard block because of the potential underlying leak then I guess I could agree. Let's just not let other hard blockers stagnate while we put a lot of effort into this one.
</devils_advocate>
It is very reproducible if you call into:

AndroidBridge::GetMimeTypeFromExtensions
or
AndroidBridge::GetHandlersForMimeType

a few dozen times.  Looks like it is just leaky on the android side.
Attached patch log for android bridge (obsolete) — Splinter Review
super helpful when debugging these sorts of problem.
Attachment #515741 - Flags: review?(blassey.bugs)
Attached patch call local delete (obsolete) — Splinter Review
not strictly required, but it does hint to js that our local reference is not needed.  with this patch, we continue to crash.
Attachment #515742 - Flags: review?(blassey.bugs)
Comment on attachment 515741 [details] [diff] [review]
log for android bridge

I don' think we always want this extra traffic, please wrap them with their own macro like ALOGIME

https://mxr.mozilla.org/mozilla-central/source/widget/src/android/nsWindow.cpp#1554
Attached patch move js to the default list (obsolete) — Splinter Review
this moves js to the default mime list.
Attachment #515749 - Flags: review?(blassey.bugs)
Comment on attachment 515742 [details] [diff] [review]
call local delete

I can't review this. but note that you've got css twice in that list
Attachment #515742 - Flags: review?(blassey.bugs) → review?(cbiesinger)
Attachment #515749 - Flags: review?(blassey.bugs) → review?(cbiesinger)
Attachment #515741 - Flags: review?(blassey.bugs) → review-
Comment on attachment 515742 [details] [diff] [review]
call local delete

wrong patch?
Attachment #515742 - Flags: review?(cbiesinger)
Attachment #515749 - Flags: review?(cbiesinger) → review?(blassey.bugs)
Attachment #515742 - Attachment is obsolete: true
Comment on attachment 515749 [details] [diff] [review]
move js to the default list

bz,is this the right thing here?

Basically what happens on android is that when we unzip this XPI files, we ask the OS what the mime type is for each and every file.  There appears to be a bug where Android (Dalvik) leaks during these calls.

Forcing .js files to always have this mimetype does resolve the issue (since we never go out to the OS).

It isn't clear to me when we should ever ask the operating system if it should/can open a .js file.
Attachment #515749 - Flags: review?(blassey.bugs) → review?(bzbarsky)
Attached patch delete local references (obsolete) — Splinter Review
hints to js.
^^ hints to java.  i do not think that this is required (per docs).  I continue to crash with this applied.
Comment on attachment 515749 [details] [diff] [review]
move js to the default list

(ignore the extra +  { TEXT_CSS, "css" },)
Comment on attachment 515757 [details] [diff] [review]
delete local references

most of these functions have a AutoLocalJNIFrame in them, does that not handle releasing these local references?
blassey - i do not think any of these are needed.  these basically tell java that the native side is done with the string.  gc() should figure this out without this call.
(btw, i don't really want you to review that patch.)
Comment on attachment 515749 [details] [diff] [review]
move js to the default list

TEXT_CSS is already in the list, please don't add it again :)
(In reply to comment #19)
> It isn't clear to me when we should ever ask the operating system if it
> should/can open a .js file.

Why does unzipping a file call this code at all?
Attachment #515749 - Attachment is obsolete: true
Attachment #515749 - Flags: review?(bzbarsky)
Attachment #515821 - Flags: review?(bzbarsky)
Attachment #515757 - Attachment is obsolete: true
Attachment #515741 - Attachment is obsolete: true
sorry for so much noise.

-- Recap -- 

Bz, i'd like to move the .js mime type from being an extra mime type (which is looked at after going to the OS) to being a default in nsExternalHelperAppService.  See comment #19.

Most everything else can be ignored.
So historically we've tried to avoid adding things to this list unless absolutely needed for performance, because once it's on that list there is no way for the user to affect what type the extension gets mapped to.  From some quick googling, there are at least three things other than JavaScript that use this extension...

Note that a web site can trigger an arbitrary number of calls to GetTypeFromExtension pretty trivially.  So if we have a problem with that function, we really need to fix it.
Attachment #515821 - Flags: review?(bzbarsky) → review-
This is just a memory leak in our string class... fail fail fail.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: