Closed Bug 529995 Opened 10 years ago Closed 10 years ago

startup crash [@ columnName] in sqlite

Categories

(Thunderbird :: General, defect, critical)

x86
All
defect
Not set
critical

Tracking

(thunderbird3.0 .1-fixed)

VERIFIED FIXED
Thunderbird 3
Tracking Status
thunderbird3.0 --- .1-fixed

People

(Reporter: wsmwk, Assigned: asuth)

References

Details

(Keywords: crash, regression, Whiteboard: [fixed RC1 build 3])

Crash Data

Attachments

(1 file, 1 obsolete file)

[from crash-stats]
startup crash [@ columnName] beginning with 20091117 3.0rc1 build2. All OS.
so far, no crashes with 20091118 nightly build. 

all but 2 crashes are startup 

bp-60704b3a-a9e9-4725-ad3b-7fce22091118 startup
0	sqlite3.dll	columnName	 db/sqlite3/src/sqlite3.c:51036
1	sqlite3.dll	sqlite3_column_name16	db/sqlite3/src/sqlite3.c:51065
2	thunderbird.exe	mozilla::storage::StatementWrapper::Initialize	storage/src/mozStorageStatementWrapper.cpp:88
3	xpcom_core.dll	NS_InvokeByIndex_P	xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:101
4	thunderbird.exe	XPCWrappedNative::CallMethod	js/src/xpconnect/src/xpcwrappednative.cpp:2456 


bp-0dedf17a-7888-4cf2-9fd5-14c4b2091118 after 316 seconds
0	libsqlite3.dylib	columnName	 db/sqlite3/src/sqlite3.c:51033
1	thunderbird-bin	mozilla::storage::StatementWrapper::Initialize	storage/src/mozStorageStatementWrapper.cpp:88
2	libxpcom_core.dylib	NS_InvokeByIndex_P	xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:179
3	thunderbird-bin	XPCWrappedNative::CallMethod	js/src/xpconnect/src/xpcwrappednative.cpp:2456
4	thunderbird-bin	XPC_WN_CallMethod	js/src/xpconnect/src/xpcwrappednativejsops.cpp:1590
5	libmozjs.dylib	js_Invoke	js/src/jsinterp.cpp:1386
6	libmozjs.dylib	js_Interpret	js/src/jsinterp.cpp:5179
7	libmozjs.dylib	js_Invoke	js/src/jsinterp.cpp:1394
8	thunderbird-bin	nsXPCWrappedJSClass::CallMethod	js/src/xpconnect/src/xpcwrappedjsclass.cpp:1697
9	thunderbird-bin	nsXPCWrappedJS::CallMethod	js/src/xpconnect/src/xpcwrappedjs.cpp:569
Flags: blocking-thunderbird3?
The first stack is too short to know what is actually happening there.

In the second stack, the root call that provides us with some context is nsMsgIncomingServer.cpp::728, which is:
  rv = loginMgr->FindLogins(&numLogins, currServer, EmptyString(),
                            currServer, &logins);
My speculative theory is that during an attempted upgrade of signons.sqlite by mozilla/toolkit/components/passwordmgr/src/storage-mozStorage.js it is creating an illegal statement.  It appears the code intentionally does this to figure out what it needs to do.

Unfortunately, out of necessity my changes defer the actual creation of the statement.  The passwordmgr code explicitly creates a mozilla::storage::StatementWrapper.  It calls nativeStatement() which actually attempts to construct the statement and fails, returning a null pointer.  Also unfortunately, the code does not check that the returned value is non-null because there was previously no chance of this happening.

sqlite3 gets handed a null pointer, bingo bingo bango, crash.

This is very suck and very much my bad.  It is the only instance of its kind that I can see; nativeStatement was a hack that only still existed for the wrapper dude; no other path exists with that type of deep rep-exposure.

A re-spin of the release candidate is likely necessary.

Thank you very very much for catching this and focusing my attention to this, Wayne.
This patch should resolve the issue.
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Attachment #413535 - Flags: superreview?(bugzilla)
Attachment #413535 - Flags: review?(bugzilla)
the full list http://crash-stats.mozilla.com/query/query?product=Thunderbird&version=ALL%3AALL&date=&range_value=4&range_unit=weeks&query_search=signature&query_type=exact&query=columnName&do_query=1

spot checking the 34 windows crashes, none are interesting.
the other mac crasher bp-a85fac68-abee-486d-87d9-3bb322091118 matches comment 0, might even be same user (FF has gone wonky, so I'm not verifying ATM)
(In reply to comment #3)
> Created an attachment (id=413535) [details]
> statement wrapper de-dangerification v1

I can understand this fixing the issue, but I believe from analysis it should also fix the failing test_storage_mozStorage_migrate.js test, but it doesn't, so I'm a bit confused as to what is going on there.
(In reply to comment #5)
> I can understand this fixing the issue, but I believe from analysis it should
> also fix the failing test_storage_mozStorage_migrate.js test, but it doesn't,
> so I'm a bit confused as to what is going on there.

The same would apply to test_db_update_v999b.js, and in theory test_db_update_v1.js should pass as well but I don't know what's going there.
It would appear that a surprising amount of code's migration logic generates SQL statements to test if a column is present.  My change breaks logic that only creates the statement and does not try and execute the statement or otherwise use it.  So far, this seems to be most migration logic.  (This does not affect non-migration logic because that's the only case it makes sense to try and compile illegal SQL.)

The crashy case of passwordmgr had failing but non-crashing tests because it does not create statement wrappers in the migration test cases, just in general use.

I have fixed passwordmgr and am in the process of patching all the other mozilla/ cases.  The changes should be fairly minimal in each case, but it does raise the question of whether the right fix is to do the proper long-term 1.9.2 implementation.  The reason I had avoided that in the first place was that it would not be a transparent change to gloda.
This makes those tests pass for me as well as all the tests for the other code I modified after inspection.  I fixed-up places too, but I'm unsure how to test it... my suspicion is we don't build it and I did too much work like a sucker.

I'm not crazy about my changes having required this, but it's still probably the best option for the current moment.
Attachment #413535 - Attachment is obsolete: true
Attachment #413617 - Flags: superreview?(bugzilla)
Attachment #413617 - Flags: review?(bugzilla)
Attachment #413535 - Flags: superreview?(bugzilla)
Attachment #413535 - Flags: review?(bugzilla)
yeah, we don't build places, afaik.
Blocks: qa-tb3.0rc1
Flags: blocking-thunderbird3? → blocking-thunderbird3+
With my patch, doing "make xpcshell-tests" in my objdir nets me:
INFO | Result summary:
INFO | Passed: 565
INFO | Failed: 0

Which is reasonably promising.
I think I can also characterize what the crash was, too...

As noted, the migration code does not construct the statement wrapper that is the source of the crash.  And the statement wrapper only crashes if it is given illegal SQL.  So I think the part where I broke the migration logic tricked the system into thinking it did not need to run migration.  Then, it tried to create a statement wrapper on what was illegal SQL because of the lack of migration, causing the crash.

My changes to the storage-mozStorage.js file that make the migration logic correct do this by attempting to create a statement wrapper.  This forces the construction of the SQL statement (which fails), while also testing that we no longer crash.  (In other words, without the fix I made, the newly changed unit tests would crash and burn, leading to a fail.)
Attachment #413617 - Flags: superreview?(bugzilla)
Attachment #413617 - Flags: superreview+
Attachment #413617 - Flags: review?(bugzilla)
Attachment #413617 - Flags: review+
Attachment #413617 - Flags: approval-thunderbird3+
Pushed to TB 3.0 rc1 relbranch (only!):
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/6dc036c10334
Group: mozilla-confidential
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Group: mozilla-confidential
Whiteboard: [fixed RC1 build 3]
Target Milestone: --- → Thunderbird 3
Whiteboard: [fixed RC1 build 3] → [fixed RC1 build 3][fixedtb301]
Whiteboard: [fixed RC1 build 3][fixedtb301] → [fixed RC1 build 3]
You pasted two sets of crash-stats things.  The address book thing is mork, not SQLite.

The other thing looks like all but 1 of the crashes are from one person whose database is corrupt, possibly because they have all kinds of weird DLLs in their process that should not be there.  The other 1 also looks like it has stuff in there that should not be in there.

Thank you for bringing these to my attention, though.  It's unfortunate crash-stats doesn't have better analytical and aggregation tools built in.  Hopefully dbaron keeps at his correlation work.
(In reply to comment #14)

> Thank you for bringing these to my attention, though.  It's unfortunate
> crash-stats doesn't have better analytical and aggregation tools built in. 
> Hopefully dbaron keeps at his correlation work.

That's being worked on and getting in place.
Crash Signature: [@ columnName]
You need to log in before you can comment on or make changes to this bug.