Closed
Bug 741836
Opened 13 years ago
Closed 12 years ago
Native crash without backtrace in SQLiteBridge
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(blocking-fennec1.0 -)
RESOLVED
FIXED
Firefox 14
Tracking | Status | |
---|---|---|
blocking-fennec1.0 | --- | - |
People
(Reporter: gcp, Unassigned)
References
Details
(Keywords: crash, Whiteboard: [native-crash])
Attachments
(1 file)
7.00 KB,
text/plain
|
Details |
Bug 736237 makes Profile Migration resumable. When testing this, I ran into an issue where the app spontaneously exits without a Java or native crash trace. Attaching jdb changes nothing. Attaching gdb makes the application work correctly. Using checkJNI makes the application work correctly. Tracing with printf debugging shows that the app exits at this statement:
https://mxr.mozilla.org/mozilla-central/source/mozglue/android/SQLiteBridge.cpp#342
This reeks of memory corruption. It would be cool if someone could valgrind this scenario and see if we have bugs.
STR:
1) Apply the patches from bug 736237.
2) Apply the attached patch.
3) Copy a big places.sqlite into the Fennec profile. (eg. from desktop machine)
4) Start Fennec, let migration finish.
5) Quit Fennec
6) Start Fennec. Fennec will crash halfway.
Updated•13 years ago
|
blocking-fennec1.0: --- → ?
Reporter | ||
Comment 1•13 years ago
|
||
We didn't reproduce the issue under valgrind yet, but there is something interesting in the trace:
I//data/local/start_valgrind_fennec( 1608): ==1609== Thread 11:
I//data/local/start_valgrind_fennec( 1608): ==1609== Invalid read of size 1
I//data/local/start_valgrind_fennec( 1608): ==1609== at 0x4806620: strlen (mc_replace_strmem.c:391)
I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x4831AAD: strndup (strndup.c:33)
I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224CB0B: report_mapping (APKOpen.cpp:681)
I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x22251103: CustomElf::Load(Mappable*, char const*, int) (CustomElf.cpp:211)
I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224FDC3: ElfLoader::Load(char const*, int, LibHandle*) (ElfLoader.cpp:251)
I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224FE8B: __wrap_dlopen (ElfLoader.cpp:46)
I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224CDF3: loadSQLiteLibs(char const*) (APKOpen.cpp:810)
I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224CEBD: Java_org_mozilla_gecko_GeckoAppShell_loadSQLiteLibsNative (APKOpen.cpp:945)
I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x540CBF3: dvmPlatformInvoke (CallEABI.S:258)
I//data/local/start_valgrind_fennec( 1608): ==1609== Address 0x4cb7082 is 0 bytes after a block of size 658 alloc'd
I//data/local/start_valgrind_fennec( 1608): ==1609== at 0x4805440: malloc (vg_replace_malloc.c:263)
I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224C46D: extractBuf(char const*, Zip*) (APKOpen.cpp:648)
I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224CDC9: loadSQLiteLibs(char const*) (APKOpen.cpp:804)
I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224CEBD: Java_org_mozilla_gecko_GeckoAppShell_loadSQLiteLibsNative (APKOpen.cpp:945)
I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x540CBF3: dvmPlatformInvoke (CallEABI.S:258)
Comment 2•13 years ago
|
||
(In reply to Gian-Carlo Pascutto (:gcp) from comment #1)
> We didn't reproduce the issue under valgrind yet, but there is something
> interesting in the trace:
>
> I//data/local/start_valgrind_fennec( 1608): ==1609== Thread 11:
> I//data/local/start_valgrind_fennec( 1608): ==1609== Invalid read of size 1
> I//data/local/start_valgrind_fennec( 1608): ==1609== at 0x4806620: strlen
> (mc_replace_strmem.c:391)
> I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x4831AAD:
> strndup (strndup.c:33)
> I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224CB0B:
> report_mapping (APKOpen.cpp:681)
> I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x22251103:
> CustomElf::Load(Mappable*, char const*, int) (CustomElf.cpp:211)
> I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224FDC3:
> ElfLoader::Load(char const*, int, LibHandle*) (ElfLoader.cpp:251)
> I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224FE8B:
> __wrap_dlopen (ElfLoader.cpp:46)
> I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224CDF3:
> loadSQLiteLibs(char const*) (APKOpen.cpp:810)
> I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224CEBD:
> Java_org_mozilla_gecko_GeckoAppShell_loadSQLiteLibsNative (APKOpen.cpp:945)
> I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x540CBF3:
> dvmPlatformInvoke (CallEABI.S:258)
> I//data/local/start_valgrind_fennec( 1608): ==1609== Address 0x4cb7082 is 0
> bytes after a block of size 658 alloc'd
> I//data/local/start_valgrind_fennec( 1608): ==1609== at 0x4805440: malloc
> (vg_replace_malloc.c:263)
> I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224C46D:
> extractBuf(char const*, Zip*) (APKOpen.cpp:648)
> I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224CDC9:
> loadSQLiteLibs(char const*) (APKOpen.cpp:804)
> I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x2224CEBD:
> Java_org_mozilla_gecko_GeckoAppShell_loadSQLiteLibsNative (APKOpen.cpp:945)
> I//data/local/start_valgrind_fennec( 1608): ==1609== by 0x540CBF3:
> dvmPlatformInvoke (CallEABI.S:258)
That's bug 740318.
Reporter | ||
Comment 3•13 years ago
|
||
Julian valgrinded the exact case that's failing for me but nothing showed up.
Comment 4•13 years ago
|
||
GCP, I thought this was blocking bug 736237 from landing in which case it would be a blocker. Bug, bug 736237 landed. Can you explain?
Reporter | ||
Comment 5•13 years ago
|
||
We didn't find anything even after extensive valgrinding of the exact scenario that crashes for me (which didn't fail for sewardj, anyway), and nobody had any further idea how to progress.
I figured that landing this would either get us some more reports and help to identify a pattern, and in the alternate case, show that this crash almost never happens.
Put differently, the alternative to landing was not doing anything.
I got an idea in the meantime to let this run and don't pass the data we get off SQLite into Android, to see if it's not some Android lib going bonkers.
Reporter | ||
Comment 6•13 years ago
|
||
I managed to crash it while stepping in the debugger:
full log: http://pastebin.mozilla.org/1560695
last part:
sqlite3VdbeExec (p=0x5c5579d8) at /home/morbo/hg/mozilla-central/db/sqlite3/src/sqlite3.c:65422
65422 if( p->rc==SQLITE_NOMEM ){
(gdb)
65427 assert( p->rc==SQLITE_OK || p->rc==SQLITE_BUSY );
(gdb)
65428 p->rc = SQLITE_OK;
(gdb)
65429 assert( p->explain==0 );
(gdb)
65430 p->pResultSet = 0;
(gdb)
65431 db->busyHandler.nBusy = 0;
(gdb)
65432 CHECK_FOR_INTERRUPT;
(gdb)
65435 checkProgress = db->xProgress!=0;
(gdb)
65438 sqlite3BeginBenignMalloc();
(gdb)
Program received signal SIGSEGV, Segmentation fault.
0xaff15c2c in ?? () from /home/morbo/jimdb/moz-gdb/lib/42800C743000157/lib/libc.so
Reporter | ||
Comment 7•13 years ago
|
||
And another one:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 15437]
0xaff15c2c in ?? () from /home/morbo/jimdb/moz-gdb/lib/42800C743000157/lib/libc.so
(gdb) bt
#0 0xaff15c2c in ?? () from /home/morbo/jimdb/moz-gdb/lib/42800C743000157/lib/libc.so
#1 0xaff1cd9c in __assert2 () from /home/morbo/jimdb/moz-gdb/lib/42800C743000157/lib/libc.so
#2 0x5dabc4ee in sqlite3MutexAlloc (id=2) at /home/morbo/hg/mozilla-central/db/sqlite3/src/sqlite3.c:17352
#3 0x5db46990 in enterMutex () at /home/morbo/hg/mozilla-central/db/sqlite3/src/sqlite3.c:115604
#4 0x5db46b94 in sqlite3ConnectionUnlocked (db=0x5c422408) at /home/morbo/hg/mozilla-central/db/sqlite3/src/sqlite3.c:115725
#5 0x5daf031e in sqlite3VdbeHalt (p=0x5c4579d8) at /home/morbo/hg/mozilla-central/db/sqlite3/src/sqlite3.c:61797
#6 0x5daf5afe in sqlite3VdbeExec (p=0x5c4579d8) at /home/morbo/hg/mozilla-central/db/sqlite3/src/sqlite3.c:65712
#7 0x5daf2704 in sqlite3Step (p=0x5c4579d8) at /home/morbo/hg/mozilla-central/db/sqlite3/src/sqlite3.c:63161
#8 0x5daf2946 in sqlite3_step (pStmt=0x5c4579d8) at /home/morbo/hg/mozilla-central/db/sqlite3/src/sqlite3.c:63234
#9 0x80b2a1ca in Java_org_mozilla_gecko_sqlite_SQLiteBridge_sqliteCall (jenv=0x117c68, jDb=0x408bb1f8, jQuery=0x407b0690,
jParams=0x40939318, jQueryRes=0x409393e8) at /home/morbo/hg/mozilla-central/mozglue/android/SQLiteBridge.cpp:362
#10 0xaca11d38 in ?? ()
#11 0xaca11d38 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
This actually makes a little sense; the crash always happens at the last "step" inside the query, so this would seem consistent with the crash originating from sqlite3VdbeHalt.
Comment 8•13 years ago
|
||
Is this bug related to bug 743715
Reporter | ||
Comment 9•13 years ago
|
||
I needed bug 743715 to provide further evidence that https://bugzilla.mozilla.org/show_bug.cgi?id=730495#c2 is correct.
Reporter | ||
Comment 10•13 years ago
|
||
Disabling jemalloc makes the crasher go away (this isn't 100% proof that jemalloc is the cause, tho).
Comment 11•13 years ago
|
||
That would seem to suggest something is allocating memory with the system allocator, and sqlite is giving that to some jemalloc function.
Reporter | ||
Comment 12•13 years ago
|
||
Mike, one suspect I am looking at is:
http://mxr.mozilla.org/mozilla-central/source/storage/src/mozStorageService.cpp#574
Which would trigger because the SQLiteBridge is called before Gecko is loaded, so it will be calling into sqlite3 without these wrappers, providing incorrect alignment.
But I'd expect that to trigger some assert in sqlite3 itself, at least that's what the bugs that caused this wrapper to be added suggest. So I'm not 100% sure.
Reporter | ||
Comment 13•13 years ago
|
||
Bug 722884 has some similarities to this one. Unsure if related.
Updated•13 years ago
|
blocking-fennec1.0: ? → -
Updated•13 years ago
|
Comment 14•13 years ago
|
||
(In reply to Gian-Carlo Pascutto (:gcp) from comment #10)
> Disabling jemalloc makes the crasher go away
That might explain why the various valgrind runs found nothing -- I was
using a non-jemalloc build.
Reporter | ||
Comment 15•13 years ago
|
||
I put some logging into SQLite itself:
http://pastebin.mozilla.org/1565953
The sqlite3MutexEnd call is probably what kills us. This is called from sqlite3_shutdown. Richard informed me yesterday that our calling of this function (which was in his opinion unnecessary) could be a cause. Looks like he's right!
Theory: the failed sqlite3_config call causes the storage service to shut down and sqlite3_shutdown to be called.
Reporter | ||
Comment 16•13 years ago
|
||
This is probably fixed by bug 721784.
Reporter | ||
Updated•12 years ago
|
Updated•12 years ago
|
Target Milestone: --- → Firefox 14
Assignee | ||
Updated•4 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•