Closed
Bug 728653
Opened 12 years ago
Closed 12 years ago
Crash in mozilla::storage::Connection::stepStatement @ sqlite3_extended_result_codes
Categories
(Toolkit :: Storage, defect)
Tracking
()
RESOLVED
FIXED
mozilla13
People
(Reporter: scoobidiver, Assigned: espindola)
References
Details
(Keywords: crash, regression, topcrash)
Crash Data
Attachments
(1 file, 4 obsolete files)
1.21 KB,
patch
|
mak
:
review+
|
Details | Diff | Splinter Review |
It's a low volume crash but there's a spike in crashes from 13.0a1/20120218. The regression range is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2271cb92cc05&tochange=550779e6bab4 Signature sqlite3_extended_result_codes More Reports Search UUID af5e2efa-2bdf-4183-bcac-cd8622120219 Date Processed 2012-02-19 03:49:57 Uptime 5180 Last Crash 1.6 hours before submission Install Age 2.3 hours since version was first installed. Install Time 2012-02-19 01:31:57 Product Firefox Version 13.0a1 Build ID 20120218031156 Release Channel nightly OS Windows NT OS Version 6.1.7601 Service Pack 1 Build Architecture x86 Build Architecture Info AuthenticAMD family 15 model 31 stepping 0 Crash Reason EXCEPTION_ACCESS_VIOLATION_READ Crash Address 0x44 App Notes AdapterVendorID: 0x10de, AdapterDeviceID: 0x01d3, AdapterSubsysID: 140110b0, AdapterDriverVersion: 8.17.12.8562 D2D? D2D- D3D10 Layers? D3D10 Layers- D3D9 Layers? D3D9 Layers+ EMCheckCompatibility True Total Virtual Memory 4294836224 Available Virtual Memory 3806265344 System Memory Use Percentage 83 Available Page File 4242325504 Available Physical Memory 362340352 Frame Module Signature [Expand] Source 0 mozsqlite3.dll sqlite3_extended_result_codes db/sqlite3/src/sqlite3.c:114877 1 xul.dll mozilla::storage::Connection::stepStatement storage/src/mozStorageConnection.cpp:881 2 xul.dll mozilla::storage::BindingParams::`scalar deleting destructor' 3 mozalloc.dll mozalloc.dll@0x10af 4 xul.dll mozilla::storage::Statement::ExecuteStep storage/src/mozStorageStatement.cpp:618 5 xul.dll mozilla::storage::Statement::Execute storage/src/mozStorageStatement.cpp:582 6 xul.dll nsAnnotationService::RemoveAnnotationInternal toolkit/components/places/nsAnnotationService.cpp:1437 7 xul.dll nsAnnotationService::RemoveItemAnnotation toolkit/components/places/nsAnnotationService.cpp:1465 8 xul.dll NS_InvokeByIndex_P xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:102 9 xul.dll XPC_WN_CallMethod js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1542 10 xul.dll xul.dll@0x10861f 11 mozjs.dll js::InvokeKernel js/src/jsinterp.cpp:499 12 mozjs.dll js::Interpret js/src/jsinterp.cpp:2694 ... More reports at: https://crash-stats.mozilla.com/report/list?signature=sqlite3_extended_result_codes
Reporter | ||
Comment 1•12 years ago
|
||
It's currently #2 top browser crasher in 13.0a1 over the last day. It might be a regression from bug 720258.
Keywords: topcrash
Comment 2•12 years ago
|
||
If I should blindly pointing to a changeset in that range I'd probably say bug 721603, so may be a ownership bug that we hit when spinning. May be hard to reproduce. The call comes from js and we don't have the js stack, something invoking removeItemAnnotation on shutdown. (In reply to Scoobidiver from comment #1) > It's currently #2 top browser crasher in 13.0a1 over the last day. > > It might be a regression from bug 720258. hard to tell, though that bug doesn't touch annotations at all.
Comment 3•12 years ago
|
||
some stack has "unrelated" je_free or mozalloc calls in the middle, that may confirm looping is related. We may try to disable spinning for a Nightly and see if the crashes disappear, just to get some additional data.
Comment 4•12 years ago
|
||
I also got this crash this morning when I closed Firefox: https://crash-stats.mozilla.com/report/index/bp-22bf4442-c1ad-48cf-84ac-eac322120221
Comment 5•12 years ago
|
||
thanks Marco, this confirms it's a crash on shutdown, so my suspect is probably correct.
Blocks: 721603
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → respindola
Assignee | ||
Comment 6•12 years ago
|
||
Looks like a case where spinning the loop makes we run code we were not running before and that code expects the database to be open. Investigating.
Assignee | ||
Comment 7•12 years ago
|
||
I added asserts to try to make this more reproducible and pushed to https://tbpl.mozilla.org/?tree=Try&rev=a6fbecc27538
Assignee | ||
Comment 8•12 years ago
|
||
The try looks green :-( This is my first time trying to debug a nightly crash. Any tips? A way of getting a js stack or at least the values passed to RemoveItemAnnotation would be very helpful. I tried proof reading the js code that calls it, but quickly got to PlacesUIUtils.jsm which has many posible callers. I downloaded the build from that try run and I am currently trying to change bookmarks, set up sync, etc to try to make it crash.
Assignee | ||
Comment 9•12 years ago
|
||
btw, I agree with mac that this would go away if we stopped spinning the loop. Should we revert that patch for now?
Comment 10•12 years ago
|
||
I'd just comment out the Spin() call with a comment pointing to this bug for now, that would be enough right?
Comment 11•12 years ago
|
||
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #6) > Looks like a case where spinning the loop makes we run code we were not > running before and that code expects the database to be open. Investigating. This is exactly why we should avoid spinning the event loop at all costs.
Assignee | ||
Comment 12•12 years ago
|
||
> This is exactly why we should avoid spinning the event loop at all costs.
No, there are other reasons, but this is not one of them.
We are just stays alive longer and and code is trying to use the places db after it has been shutdown
I will upload a patch commenting the spin and pointing to this bug.
Assignee | ||
Comment 13•12 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=d07f5ba30267
Attachment #599572 -
Flags: review?(mak77)
Comment 14•12 years ago
|
||
Comment on attachment 599572 [details] [diff] [review] disable the spinning for now Review of attachment 599572 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/places/Database.cpp @@ +1768,5 @@ > new BlockingConnectionCloseCallback(); > + // The spinning is temporarily disabled. See bug 728653. > + (void)mMainConn->AsyncClose(NULL); > + // closeListener->Spin(); > + closeListener->Complete(); just move the comment before the commented out Spin you can still pass closeListener to AsyncClose and it will notify on complete by itself, why do you have to remove that and call manually? as it is this will notify too early.
Assignee | ||
Comment 15•12 years ago
|
||
I was afraid of the function returning before the notification, but that is what we used to do, so it is probably the best way of disabling this.
Attachment #599572 -
Attachment is obsolete: true
Attachment #599576 -
Flags: review?(mak77)
Attachment #599572 -
Flags: review?(mak77)
Assignee | ||
Comment 16•12 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=6acb2ca19a5c
Comment 17•12 years ago
|
||
Comment on attachment 599576 [details] [diff] [review] new patch Review of attachment 599576 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/places/Database.cpp @@ +1767,5 @@ > nsRefPtr<BlockingConnectionCloseCallback> closeListener = > new BlockingConnectionCloseCallback(); > (void)mMainConn->AsyncClose(closeListener); > + // The spinning is temporarily disabled. See bug 728653. > + // closeListener->Spin(); really-nit: all those whitespaces look useless!
Attachment #599576 -
Flags: review?(mak77) → review+
Assignee | ||
Comment 18•12 years ago
|
||
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=529fdd0c82a1
Assignee | ||
Comment 19•12 years ago
|
||
Reverted the accidental bits in https://hg.mozilla.org/integration/mozilla-inbound/rev/44b5705eeb21
Assignee | ||
Comment 20•12 years ago
|
||
looking at the stacks, they look like nsAnnotationService::RemoveItemAnnotation js garbage mozilla::net::HttpBaseChannel::DoNotifyListener (calls OnStartRequest) mozilla::net::HttpAsyncAborter<nsHttpChannel>::HandleAsyncAbort nsRunnableMethodImpl<void , true>::Run the last runnable is created by AsyncAbort (via AsyncCall). And the AsyncClose Run method looks like (void)mConnection->internalClose(); <- sets mDBConn to NULL if (mCallbackEvent) (void)mCallingThread->Dispatch(mCallbackEvent, NS_DISPATCH_NORMAL); What I think is happening is that the TODO list for our thread looks like ...| AsyncCloseConnection | HttpAsyncAborter<nsHttpChannel>::HandleAsyncAbort | When AsyncCloseConnection::Run executes, it will set mDBConn to NULL and dispatch the callback. The TODO list now looks like | HttpAsyncAborter<nsHttpChannel>::HandleAsyncAbort | CallBack When the HandleAsyncAbort executes mDBConn is already null and we get our segmentation fault. I can see two ways we can fix this: * Make sure all http requests are done by the time we try to close the places db (and make sure no new ones are created) * Make anything that has a callback from an http realize that the db might already be closed. The second option is probably the best unless we have something that really must be written to disk because an http connection got canceled (really?). Now, any guesses on which onStartRequest implementation we have on the stack?
Assignee | ||
Comment 21•12 years ago
|
||
Mak, what do you think? The idea is that notifications about connections are not interesting for anything we have to record on disk, so we skip them during shutdown. https://tbpl.mozilla.org/?tree=Try&rev=e27bd3c076a6
Attachment #599716 -
Flags: review?(mak77)
Comment 22•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/529fdd0c82a1 https://hg.mozilla.org/mozilla-central/rev/44b5705eeb21 Leaving open for unreviewed patch. In the future, please can you add [leave open] to the whiteboard (along the lines of https://wiki.mozilla.org/Tree_Rules/Inbound#Please_do_the_following_after_pushing_to_inbound), since I only happened to spot this by chance + once I create something automated, the whiteboard will be the only thing it looks at.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla13
Comment 23•12 years ago
|
||
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #20) > looking at the stacks, they look like So, based on IRC discussion, I think what was hitting annotations on shutdown were livemarks. The description of the events matches perfectly what they do. Bug 613588 removes all of this, though add-ons may still do something similar, so we still need a more general fix. looks like > ...| AsyncCloseConnection | > HttpAsyncAborter<nsHttpChannel>::HandleAsyncAbort | > > When AsyncCloseConnection::Run executes, it will set mDBConn to NULL and > dispatch the callback. The TODO list now looks like > > | HttpAsyncAborter<nsHttpChannel>::HandleAsyncAbort | CallBack > > When the HandleAsyncAbort executes mDBConn is already null and we get our > segmentation fault. This is very plausible, though I don't see how spinning the loop makes this different from before. I suppose that event was executed before too, on merging the threads. Why were we not crashing? And why do we see destructors and free calls in the middle of the stack? My fear is that really we are spinning, and one of the events is spinning as well (my supposition here is Sync). We may need to add some additional crash protection in Storage in addition to fixes here, to avoid entering SQLite in a bad state. > I can see two ways we can fix this: > > * Make sure all http requests are done by the time we try to close the > places db (and make sure no new ones are created) > * Make anything that has a callback from an http realize that the db might > already be closed. > > The second option is probably the best unless we have something that really > must be written to disk because an http connection got canceled (really?). Well, considered the kind of data we handle at the end of those requests, I see no problem in Places and many other components. Though I can't speak globally for all the codebase, there may be something that relies on these notifications. Thus I think the healthier approach is to start getting feedback from Necko peers, I don't know that code good enough to tell which are the implications of the change.
Assignee | ||
Comment 24•12 years ago
|
||
> So, based on IRC discussion, I think what was hitting annotations on > shutdown were livemarks. The description of the events matches perfectly > what they do. Bug 613588 removes all of this, though add-ons may still do > something similar, so we still need a more general fix. So that should fix this. I still think we should handle the general case if we can. > This is very plausible, though I don't see how spinning the loop makes this > different from before. It makes us get here. Without it we would quick with the last TODO items scheduled but not executed. In particular, we would not execute the notification that an http connection was canceled. > I suppose that event was executed before too, on > merging the threads. Why were we not crashing? This part I do not know, but before we spinned the loop we would quick without internalClose running, so we know that some items were not being run. The http notifications where probably one of them. > And why do we see destructors > and free calls in the middle of the stack? Like this: 2 xul.dll mozilla::storage::BindingParams::`vector deleting destructor' 3 mozalloc.dll mozalloc.dll@0x10af Looks like it is just breakpad getting confused with destroying BindingParams. Note that this doesn't show up with other compilers. > My fear is that really we are > spinning, and one of the events is spinning as well (my supposition here is > Sync). Then the that would show up in the stack. The spin point is way up from that destructor call. > We may need to add some additional crash protection in Storage in addition > to fixes here, to avoid entering SQLite in a bad state. Something like what I added in https://hg.mozilla.org/try/rev/a6fbecc27538 with the "what do we do" replaced by a return? I would like to avoid that if we can as I am afraid it will just hide harder to find bugs. > Thus I think the healthier approach is to start getting feedback from Necko > peers, I don't know that code good enough to tell which are the implications > of the change. cced biesi.
Assignee | ||
Updated•12 years ago
|
Attachment #599716 -
Flags: feedback?(jduell.mcbugs)
Assignee | ||
Comment 25•12 years ago
|
||
I don't think there is a way to have multiple feedback?, but we need someone to comment on the Necko changes.
Assignee | ||
Updated•12 years ago
|
Attachment #599716 -
Flags: feedback?(cbiesinger)
Assignee | ||
Comment 26•12 years ago
|
||
I just realized that the behavior for NeckoChild.cpp is unchanged: + HttpChannelChild* httpChannel = new HttpChannelChild(NULL); .... + if (mListener && !(mConnMgr && mConnMgr->isShuttingDown())) { mak, does this really need to block on Necko feedback?
Comment 27•12 years ago
|
||
Comment on attachment 599716 [details] [diff] [review] don't notify during shutdown This makes me incredibly scared and I really don't want to see this checked in. Not sending any listener notifications seems like a recipe for disaster to me, it is breaking a longstanding necko contract that listener notifications will always be sent. Can you just handle this in the listener somehow?
Attachment #599716 -
Flags: feedback?(cbiesinger) → feedback-
Comment 28•12 years ago
|
||
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #26) > mak, does this really need to block on Necko feedback? I'm not sure what's the question, I surely can't approve any changes at the Necko level. I'd be fine if we'd figure out a fix in Storage to avoid crashing (and possibly a testcase).
Assignee | ||
Comment 29•12 years ago
|
||
> This makes me incredibly scared and I really don't want to see this checked
> in. Not sending any listener notifications seems like a recipe for disaster
> to me, it is breaking a longstanding necko contract that listener
> notifications will always be sent. Can you just handle this in the listener
> somehow?
May I ask what is the use of getting a notification of a http connection being closed during shutdown? What are you doing with the notification?
I can use a check on the lines of
if (we are shutting down)
drop write request
in the annotation service. That is effectively what is happening today since we exit without waiting even for the places db connection to be closed. Do you really find that a better solution? If so I can upload a patch for review.
Assignee | ||
Comment 30•12 years ago
|
||
> I'd be fine if we'd figure out a fix in Storage to avoid crashing (and > possibly a testcase). I can easily implement the test in comment 29. Are you ok with it? A testcase without stubbing the shutdown process is hard to write as we need to have a connection shutdown propagating to an annotation request just at the right time. From comment 20, the thread TODO list has to look like ...| AsyncCloseConnection | HttpAsyncAborter<nsHttpChannel>::HandleAsyncAbort |
Comment 31•12 years ago
|
||
Killing the annotations writes is not a general enough fix imo, it's just one thing out of the thousands add-ons may do, that would still crash in the same exact point. And doesn't prevent anyone from adding other problematic writes in future. Would not be even useful, since we already fixed this specific crash instance by rewriting livemarks. We have uncovered a possible crashing point in Storage, and we should stop the crash from Storage, so that trying to use a statement this late would just abruptly fail, without crashing. Figuring out a stop-crash strategy in Storage would also probably allow to write a proper test.
Assignee | ||
Comment 32•12 years ago
|
||
Attachment #599576 -
Attachment is obsolete: true
Attachment #599716 -
Attachment is obsolete: true
Attachment #601113 -
Flags: review?(mak77)
Attachment #599716 -
Flags: review?(mak77)
Attachment #599716 -
Flags: feedback?(jduell.mcbugs)
Comment 33•12 years ago
|
||
(In reply to Christian :Biesinger (don't email me, ping me on IRC) from comment #27) > Comment on attachment 599716 [details] [diff] [review] > don't notify during shutdown > > This makes me incredibly scared and I really don't want to see this checked > in. Not sending any listener notifications seems like a recipe for disaster > to me, it is breaking a longstanding necko contract that listener > notifications will always be sent. Can you just handle this in the listener > somehow? We need to change how we do shutdown for bug 662444. That's involves changing some semantics. I don't see why it's incredibly scary to avoid telling someone about our connections on shutdown.
Comment 34•12 years ago
|
||
So, at the time of the review I wasn't aware that the plan was to shutdown using exit(0), but still, it does not seem like the right fix is to avoid sending status updates when you could easily change the recipient of the update to not crash. Some listeners may want to do stuff, even on shutdown.
Assignee | ||
Comment 35•12 years ago
|
||
(In reply to Christian :Biesinger (don't email me, ping me on IRC) from comment #34) > So, at the time of the review I wasn't aware that the plan was to shutdown > using exit(0), but still, it does not seem like the right fix is to avoid > sending status updates when you could easily change the recipient of the > update to not crash. Some listeners may want to do stuff, even on shutdown. *may* is the key. It is possible to imagine something that breaks any interesting patch, but to get to exit(0) will have to change the semantics of which messages are available during shutdown and have code adapt accordingly. Having said that, lets try to keep this bug to the minimal possible to avoid accessing a null db during shutdown. I don't like the current patch but it seems to be what most prefer, so be it.
Assignee | ||
Comment 36•12 years ago
|
||
A try push is at https://tbpl.mozilla.org/?tree=Try&rev=ec91b6ea3a3f
Comment 37•12 years ago
|
||
Comment on attachment 601113 [details] [diff] [review] "fix" the problem Review of attachment 601113 [details] [diff] [review]: ----------------------------------------------------------------- I sent a mail to sdwilsh, asuth and Rafael with some deeper thoughts on the situation at shutdown and how we may crash, just extracting what happens here, imo 1. all cached statements are finalized 2. asyncClose runs 3. something asks the StatementCache for removeItemAnnotationStatement, it makes a new one 4. internalClose runs and sets mDBConn to null, it fails to close the connection since there is an existing statement, but regardless we set mDBConn to null! 5. something asks the StatementCache for removeItemAnnotationStatement, it exists in the cache, so it is just returned 6. the statements gets executed, Statement::ExecuteStep() is invoked, and it invokes mDBConnection->stepStatement() (the statement internal pointer to the connection is still valid, so this doesn't crash) 7. stepStatement tries to use mDBConn, null dereference, crash Based on this I think this patch is correct but wants a much nicer comment explaining how we may enter here with a null mDBConn, that is when closing the connection fails due to non finalized statements and one of this non finalized statements is executed after internalClose. Though we also want an additional bug to make Storage less lean to this kind of shutdown crashes, since this is one possible crash point but there may be other hitting similar paths. For this I'm asking feedback to long experienced Storage peers.
Attachment #601113 -
Flags: review?(mak77) → feedback+
Assignee | ||
Comment 38•12 years ago
|
||
> 1. all cached statements are finalized > 2. asyncClose runs > 3. something asks the StatementCache for removeItemAnnotationStatement, it > makes a new one > 4. internalClose runs and sets mDBConn to null, it fails to close the > connection since there is an existing statement, but regardless we set > mDBConn to null! > 5. something asks the StatementCache for removeItemAnnotationStatement, it > exists in the cache, so it is just returned > 6. the statements gets executed, Statement::ExecuteStep() is invoked, and it > invokes mDBConnection->stepStatement() (the statement internal pointer to > the connection is still valid, so this doesn't crash) > 7. stepStatement tries to use mDBConn, null dereference, crash Looks like a reasonable hypotheses and in line with comment 20. > Based on this I think this patch is correct but wants a much nicer comment > explaining how we may enter here with a null mDBConn, that is when closing > the connection fails due to non finalized statements and one of this non > finalized statements is executed after internalClose. I still think the patch is wrong, but will add a comment. > Though we also want an additional bug to make Storage less lean to this kind > of shutdown crashes, since this is one possible crash point but there may be > other hitting similar paths. For this I'm asking feedback to long > experienced Storage peers.
Assignee | ||
Comment 39•12 years ago
|
||
Attachment #601113 -
Attachment is obsolete: true
Attachment #601612 -
Flags: review?(mak77)
Comment 40•12 years ago
|
||
Comment on attachment 601612 [details] [diff] [review] now with comments Review of attachment 601612 [details] [diff] [review]: ----------------------------------------------------------------- ::: storage/src/mozStorageConnection.cpp @@ +886,5 @@ > + // the list looking like > + // AsyncCloseConnection | HttpAsyncAborter<nsHttpChannel>::HandleAsyncAbort > + // When AsyncCloseConnection::Run executes the second time (now on the calling > + // thread), it will set mDBConn to null. It knows there will be no other > + // async user, but a sync one might still try to execute. I'd prefer something more related to Storage internals, and less verbose. The bug is always here for more insight. r=me with a comment like (please fix any english mistake I may have introduced): // mDBConn may be null if the executing statement has been created and cached // after a call to asyncClose() but before the connection has been nullified // by internalClose(). In such a case closing the connection fails due to // the existence of prepared statements, but mDBConn is set to null regardless. // This usually happens when other tasks using cached statements are asynchronously // scheduled for execution and any of them ends up after asyncClose. // See bug 728653 for details.
Attachment #601612 -
Flags: review?(mak77) → review+
Assignee | ||
Comment 41•12 years ago
|
||
btw, it is not a particularly good use of my time to be asked to write a comment just to have it be discarded. https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=b92e7499019e
Comment 42•12 years ago
|
||
right, next time I will directly write a suggested comment in the review, it's a good suggestion.
Comment 43•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/b92e7499019e
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•