Closed Bug 479729 Opened 15 years ago Closed 15 years ago

Unable to shutdown using async statements

Categories

(Toolkit :: Storage, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9.2a1

People

(Reporter: Mook, Assigned: sdwilsh)

Details

(Keywords: fixed1.9.1)

Attachments

(3 files)

sdwilsh told me to file this one, I have no idea what the heck I'm doing ;)

Using his try server build, 2009-02-21_20:53-try-1bb16971859, there's some background thread in an infinite-looking loop (not this bug), and shutdown doesn't work because of that.

Attached stack is of the main thread (which appears to be eating CPU) and what looks like the loopy background sqlite thread.
Actually, looks like mozStorageTrasnaction keeps sleeping while trying to do the roolback, which means it keeps getting SQLITE_BUSY.  The thread that mozStorageConnection is trying to close won't go away.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Hardware: x86 → All
Summary: Deadlock on shutdown using async statements → Unable to shutdown using async statements
After talking with drh, we need to finalize our statements before we try to rollback.  It thinks there is pending work to do.
...and returns SQLITE_BUSY as a result.
Requesting blocking since this could very easily hit anyone who uses this API (multiple statements) and cancels.  I don't think any internal code does this, but add-ons could.
Assignee: nobody → sdwilsh
Status: NEW → ASSIGNED
Flags: blocking1.9.1?
Priority: -- → P2
Target Milestone: --- → mozilla1.9.2a1
Attached patch v1.0Splinter Review
Attachment #363770 - Flags: review?(bugmail)
Whiteboard: [needs review asuth]
As straight-forward as this problem is (especially after peeking at the sqlite sources to verify the problem), I figured I would try and add a unit test for sanity purposes.  This test hangs (as expected) without the patch, and passes (as expected) with the patch.  In many ways, I feel like this makes it a bad test.  If there's a way to convince xpcshell to quit hard, then this might be an okay test.

(In order to cause the async operation to be long running I populate a table with 256 rows which we then have a SELECT join across 4 times, for an expected result row count of 256**4... it is unlikely/hopeful that we get a chance to cancel() before that completes.  Since it stops doing things when we cancel, this turns out okay-ish.)
Attachment #363790 - Flags: review?(sdwilsh)
Attachment #363770 - Flags: review?(bugmail) → review+
(In reply to comment #6)
> result row count of 256**4... it is unlikely/hopeful that we get a chance to
> cancel() before that completes.  Since it stops doing things when we cancel,

Er, I meant to say it is unlikely that we won't get a chance to cancel before it completes / I am hopeful that we get a chance to cancel before it completes.  The cost is that we burn a bunch of CPU cycles for the privilege to actually get to cancel something.
Whiteboard: [needs review asuth] → [has review]
Comment on attachment 363790 [details] [diff] [review]
candidate unit test, although maybe it's not a good idea

Yeah, I thought about how to test this but couldn't think of anything that'd be reliable.  r- for the reasons you described.  I don't think this is really testable in a reliable way sadly.

I talked to drh about getting an api addition to track this in the future so in debug builds we can dump out an error at least.
Attachment #363790 - Flags: review?(sdwilsh) → review-
Flags: blocking1.9.1? → blocking1.9.1+
Shawn, I've tested your add-on and I've noticed that when I enter a new url inside the location bar and press enter - nothing happens for about 10-20 seconds. After that the page gets loaded. I had to stop further testing and deactivated the add-on for now.
(In reply to comment #9)
> Shawn, I've tested your add-on and I've noticed that when I enter a new url
> inside the location bar and press enter - nothing happens for about 10-20
> seconds. After that the page gets loaded. I had to stop further testing and
> deactivated the add-on for now.
not sure this is the bug you meant to comment in...
Oh yeah, bug 455555 was my target. Sorry.
http://hg.mozilla.org/mozilla-central/rev/cb41dd06a36d
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: [has review]
Any way to verify the fix? Eventually by creating a tryserver build with one of the patches you have used for a trunk tryserver build and which caused this hang? If that would be possible could you prepare such a build?
I could do that, but the cost seems pretty high when we can just verify that this is fixed with a simple code inspection.
Or is it enough to have your add-on installed and check it this way?
one of the early versions could trigger this
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: