Closed Bug 507199 Opened 10 years ago Closed 10 years ago

Fix race condition in finalizing asynchronous storage statements

Categories

(Toolkit :: Storage, defect, blocker)

All
Windows Server 2003
defect
Not set
blocker

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1
Tracking Status
status1.9.2 --- beta1-fixed

People

(Reporter: sdwilsh, Assigned: sdwilsh)

References

Details

(Keywords: crash, intermittent-failure, verified1.9.2)

Attachments

(2 files)

This has been happening on the places branch for a little while.  mak has not been able to reproduce it locally, and it appears to be windows only.

Most recent failure:
WINNT 5.2 places unit test on 2009/07/29 09:23:07  
http://tinderbox.mozilla.org/showlog.cgi?log=Places/1248884587.1248892210.567.gz&fulltext=1
Whiteboard: [orange]
WINNT 5.2 mozilla-central test everythingelse on 2009/07/29 14:09:00  
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1248901740.1248903720.8789.gz&fulltext=1
According to robarnold, the return code means we are doing a null pointer dereference.
Correction - it's an access violation.  May or may not be null.
Running these tests under valgrind doesn't come up with anything.
i'm now running those tests in an infinite loop... in more than 200 runs i've not seen any crash.
Maybe we should disable these tests until bug 506042 gets resolved?  This is happening a lot more than I'd like suddenly, and it's lack of reproducibility makes it next to impossible to debug.
I'm just going to disable these tests.  Once bug 506042 gets resolved, we can re-enable them and debug.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1249143659.1249148848.16320.gz
WINNT 5.2 mozilla-central unit test on 2009/08/01 09:20:59  

looks like the same thing except in test_browser_places\unit\test_browserGlue_shutdown.js ?
at this point this appear to be JS or cycle collector crash, disabling all tests is not an option and will simply move on the crash to another test.
on Windows xpcshell tests i still crash on bug 507424, could be worth reenabling tests after that has been fixed, to check if it's somehow related.
in comment 14 i meant i crash in other xpcshell tests, not during the above ones.
i'd like to try reenabling these tests, to see if the crashes were depending on some of the JS crashes recently fixed. I'll try to do that at a quiet time when i'l have enough time to follow the tree.
Summary: test_browserGlue_corrupt_nobackup_default.js and test_browserGlue_corrupt_nobackup.js randomly crash → test_browserGlue_corrupt_nobackup_default.js, test_browserGlue_corrupt_nobackup.js, test_browserGlue_shutdown.js randomly crash
i'd say there is nothing we can do at this time, till we get a valid stack trace, somehow.
Attached file Stack
Tracemonkey took the patch from bug 506042 ahead of mozilla-central, this is the stack from the xpcshell run where it crashed running test_browser_places\unit\test_browserGlue_shutdown.js.
sayrer was puzzling over this, but I guess this is the same crash as m-c, just something landed on tracemonkey to make it more frequent.
yeah, this is happening every run on TM windows. I think there's a race on shutdown in the storage code.
the same thing on Places branch, see bug 511965.
This is, indeed, a race condition on shutdown.  Here's what is happening:
1) We call Close on the Connection object.  This causes us to spin the background thread's event loop and process any pending events.
2) The events are processed, and then in AsyncExecuteStatements::notifyComplete we call finalize on the StatementData object.  This resets the statement, and then releases the reference to it.
3) Our reference count on our Statement goes to zero, so the destructor is called, which calls Statement::Finalize.
4) In Statement::Finalize, we try to dispatch an event to the background thread to finalize the async statement (this is needed to serialize access to the async statement).  However, at this time, we can no longer get the background thread.

The fix here is simple - we just synchronously finalize if we cannot get the background thread (because it's shutting down or has been shutdown).
Component: Places → Storage
Flags: blocking1.9.2?
QA Contact: places → storage
Target Milestone: --- → mozilla1.9.2b1
Attached patch v1.0Splinter Review
Assignee: nobody → sdwilsh
Status: NEW → ASSIGNED
Attachment #396480 - Flags: review?(bugmail)
This was happening in the wild too: http://➡.ws/蚯
Whiteboard: [orange] → [orange][needs review asuth]
No longer blocks: 478718
So, I think the tracemonkey merge turned this into constant Windows orange.
Severity: normal → blocker
Attachment #396480 - Flags: review?(bugmail) → review+
Whiteboard: [orange][needs review asuth] → [orange][can land]
Duplicate of this bug: 509481
please when landing this also re-enable disabled tests (see comment 11)
pushed by Jesse
http://hg.mozilla.org/mozilla-central/rev/2ee676e74eac

I'm going to re-enable the tests and mark this fixed.
re-enaled by Jesse :)
http://hg.mozilla.org/mozilla-central/rev/cf74f6f746c3
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
This impacts 1.9.2 as well.
Summary: test_browserGlue_corrupt_nobackup_default.js, test_browserGlue_corrupt_nobackup.js, test_browserGlue_shutdown.js randomly crash → Fix race condition in finalizing asynchronous storage statements
Whiteboard: [orange][can land] → [orange]
Target Milestone: mozilla1.9.2b1 → mozilla1.9.3a1
Attachment #396480 - Flags: approval1.9.2?
Keywords: crash
Attachment #396480 - Flags: approval1.9.2? → approval1.9.2+
Flags: blocking1.9.2? → blocking1.9.2+
assuming this is fixed with tests re-enabled and passing.
Keywords: verified1.9.2
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.