Closed Bug 529995 Opened 10 years ago Closed 10 years ago
startup crash [@ column
Name] in sqlite
[from crash-stats] startup crash [@ columnName] beginning with 20091117 3.0rc1 build2. All OS. so far, no crashes with 20091118 nightly build. all but 2 crashes are startup bp-60704b3a-a9e9-4725-ad3b-7fce22091118 startup 0 sqlite3.dll columnName db/sqlite3/src/sqlite3.c:51036 1 sqlite3.dll sqlite3_column_name16 db/sqlite3/src/sqlite3.c:51065 2 thunderbird.exe mozilla::storage::StatementWrapper::Initialize storage/src/mozStorageStatementWrapper.cpp:88 3 xpcom_core.dll NS_InvokeByIndex_P xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:101 4 thunderbird.exe XPCWrappedNative::CallMethod js/src/xpconnect/src/xpcwrappednative.cpp:2456 bp-0dedf17a-7888-4cf2-9fd5-14c4b2091118 after 316 seconds 0 libsqlite3.dylib columnName db/sqlite3/src/sqlite3.c:51033 1 thunderbird-bin mozilla::storage::StatementWrapper::Initialize storage/src/mozStorageStatementWrapper.cpp:88 2 libxpcom_core.dylib NS_InvokeByIndex_P xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:179 3 thunderbird-bin XPCWrappedNative::CallMethod js/src/xpconnect/src/xpcwrappednative.cpp:2456 4 thunderbird-bin XPC_WN_CallMethod js/src/xpconnect/src/xpcwrappednativejsops.cpp:1590 5 libmozjs.dylib js_Invoke js/src/jsinterp.cpp:1386 6 libmozjs.dylib js_Interpret js/src/jsinterp.cpp:5179 7 libmozjs.dylib js_Invoke js/src/jsinterp.cpp:1394 8 thunderbird-bin nsXPCWrappedJSClass::CallMethod js/src/xpconnect/src/xpcwrappedjsclass.cpp:1697 9 thunderbird-bin nsXPCWrappedJS::CallMethod js/src/xpconnect/src/xpcwrappedjs.cpp:569
The first stack is too short to know what is actually happening there. In the second stack, the root call that provides us with some context is nsMsgIncomingServer.cpp::728, which is: rv = loginMgr->FindLogins(&numLogins, currServer, EmptyString(), currServer, &logins);
My speculative theory is that during an attempted upgrade of signons.sqlite by mozilla/toolkit/components/passwordmgr/src/storage-mozStorage.js it is creating an illegal statement. It appears the code intentionally does this to figure out what it needs to do. Unfortunately, out of necessity my changes defer the actual creation of the statement. The passwordmgr code explicitly creates a mozilla::storage::StatementWrapper. It calls nativeStatement() which actually attempts to construct the statement and fails, returning a null pointer. Also unfortunately, the code does not check that the returned value is non-null because there was previously no chance of this happening. sqlite3 gets handed a null pointer, bingo bingo bango, crash. This is very suck and very much my bad. It is the only instance of its kind that I can see; nativeStatement was a hack that only still existed for the wrapper dude; no other path exists with that type of deep rep-exposure. A re-spin of the release candidate is likely necessary. Thank you very very much for catching this and focusing my attention to this, Wayne.
This patch should resolve the issue.
the full list http://crash-stats.mozilla.com/query/query?product=Thunderbird&version=ALL%3AALL&date=&range_value=4&range_unit=weeks&query_search=signature&query_type=exact&query=columnName&do_query=1 spot checking the 34 windows crashes, none are interesting. the other mac crasher bp-a85fac68-abee-486d-87d9-3bb322091118 matches comment 0, might even be same user (FF has gone wonky, so I'm not verifying ATM)
(In reply to comment #3) > Created an attachment (id=413535) [details] > statement wrapper de-dangerification v1 I can understand this fixing the issue, but I believe from analysis it should also fix the failing test_storage_mozStorage_migrate.js test, but it doesn't, so I'm a bit confused as to what is going on there.
(In reply to comment #5) > I can understand this fixing the issue, but I believe from analysis it should > also fix the failing test_storage_mozStorage_migrate.js test, but it doesn't, > so I'm a bit confused as to what is going on there. The same would apply to test_db_update_v999b.js, and in theory test_db_update_v1.js should pass as well but I don't know what's going there.
It would appear that a surprising amount of code's migration logic generates SQL statements to test if a column is present. My change breaks logic that only creates the statement and does not try and execute the statement or otherwise use it. So far, this seems to be most migration logic. (This does not affect non-migration logic because that's the only case it makes sense to try and compile illegal SQL.) The crashy case of passwordmgr had failing but non-crashing tests because it does not create statement wrappers in the migration test cases, just in general use. I have fixed passwordmgr and am in the process of patching all the other mozilla/ cases. The changes should be fairly minimal in each case, but it does raise the question of whether the right fix is to do the proper long-term 1.9.2 implementation. The reason I had avoided that in the first place was that it would not be a transparent change to gloda.
This makes those tests pass for me as well as all the tests for the other code I modified after inspection. I fixed-up places too, but I'm unsure how to test it... my suspicion is we don't build it and I did too much work like a sucker. I'm not crazy about my changes having required this, but it's still probably the best option for the current moment.
yeah, we don't build places, afaik.
Flags: blocking-thunderbird3? → blocking-thunderbird3+
With my patch, doing "make xpcshell-tests" in my objdir nets me: INFO | Result summary: INFO | Passed: 565 INFO | Failed: 0 Which is reasonably promising.
I think I can also characterize what the crash was, too... As noted, the migration code does not construct the statement wrapper that is the source of the crash. And the statement wrapper only crashes if it is given illegal SQL. So I think the part where I broke the migration logic tricked the system into thinking it did not need to run migration. Then, it tried to create a statement wrapper on what was illegal SQL because of the lack of migration, causing the crash. My changes to the storage-mozStorage.js file that make the migration logic correct do this by attempting to create a statement wrapper. This forces the construction of the SQL statement (which fails), while also testing that we no longer crash. (In other words, without the fix I made, the newly changed unit tests would crash and burn, leading to a fail.)
Pushed to TB 3.0 rc1 relbranch (only!): http://hg.mozilla.org/releases/mozilla-1.9.1/rev/6dc036c10334
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [fixed RC1 build 3]
Target Milestone: --- → Thunderbird 3
Whiteboard: [fixed RC1 build 3] → [fixed RC1 build 3][fixedtb301]
v.fixed columnName does not appear in any crashes after build 20091117 http://crash-stats.mozilla.com/query/query?product=Thunderbird&version=ALL%3AALL&date=12%2F2%2F2009&range_value=2&range_unit=weeks&query_search=signature&query_type=exact&query=columnName&do_query=1 However ... there are new crashes appearing as of 2009112100 where sqlite is prominent, and SePaplNT.dll and SePcmnN appears http://crash-stats.mozilla.com/report/list?product=Thunderbird&version=Thunderbird%3A3.0&query_search=signature&query_type=exact&query=&date=&range_value=2&range_unit=days&do_query=1&signature=nsAddrDatabase%3A%3AGetRowForCharColumn%28unsigned%20short%20const*%2C%20unsigned%20int%2C%20int%2C%20int%2C%20nsIMdbRow**%2C%20int*%29 http://crash-stats.mozilla.com/report/list?product=Thunderbird&version=Thunderbird%3A3.0&query_search=signature&query_type=exact&query=&date=&range_value=2&range_unit=days&do_query=1&signature=memcpy%20|%20dataBufferAppend2
Status: RESOLVED → VERIFIED
You pasted two sets of crash-stats things. The address book thing is mork, not SQLite. The other thing looks like all but 1 of the crashes are from one person whose database is corrupt, possibly because they have all kinds of weird DLLs in their process that should not be there. The other 1 also looks like it has stuff in there that should not be in there. Thank you for bringing these to my attention, though. It's unfortunate crash-stats doesn't have better analytical and aggregation tools built in. Hopefully dbaron keeps at his correlation work.
(In reply to comment #14) > Thank you for bringing these to my attention, though. It's unfortunate > crash-stats doesn't have better analytical and aggregation tools built in. > Hopefully dbaron keeps at his correlation work. That's being worked on and getting in place.
You need to log in before you can comment on or make changes to this bug.