Closed Bug 346933 Opened 14 years ago Closed 14 years ago

crashes [@ morkRowObject::CloseRowObject] during shutdown

Categories

(Toolkit :: Autocomplete, defect)

1.8 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla1.8.1

People

(Reporter: dbaron, Assigned: Bienvenu)

Details

(Keywords: fixed1.8.1, topcrash)

Crash Data

Attachments

(2 files, 2 obsolete files)

There are a bunch of crashes in Firefox builds on MOZILLA_1_8_BRANCH, most of the comments mentioning shutdown (although some mention wikipedia URLs, I think).  I'm going to guess that the top frame on the stack just ends up that way by reliable coincidence, although I'd like to look into the detailed talkback data more carefully.

An example stack is:

Incident ID: 21669156
Stack Signature	morkRowObject::CloseRowObject 6eeeef8a
Product ID	Firefox2
Build ID	2006080104
Trigger Time	2006-08-01 10:20:10.0
Platform	Win32
Operating System	Windows NT 5.1 build 2600
Module	firefox.exe + (00053828)
URL visited	
User Comments	
Since Last Crash	2119 sec
Total Uptime	2119 sec
Trigger Reason	Access violation
Source File, Line No.	c:/builds/tinderbox/Fx-Mozilla1.8/WINNT_5.2_Depend/mozilla/db/mork/src/morkRowObject.cpp, line 606
Stack Trace 	
morkRowObject::CloseRowObject  [c:/builds/tinderbox/Fx-Mozilla1.8/WINNT_5.2_Depend/mozilla/db/mork/src/morkRowObject.cpp, line 606]
DOMGCCallback  [c:/builds/tinderbox/Fx-Mozilla1.8/WINNT_5.2_Depend/mozilla/dom/src/base/nsJSEnvironment.cpp, line 2224]
js_ForceGC  [c:/builds/tinderbox/Fx-Mozilla1.8/WINNT_5.2_Depend/mozilla/js/src/jsgc.c, line 2251]

(That's all the stack talkback recorded.)
It seems most likely that this is a problem with GC callback users, not the JS engine itself, although it's hard to guess which GC callback user.
Flags: blocking1.8.1?
Severity: normal → critical
OS: Linux → Windows XP
Version: Trunk → 1.8 Branch
Flags: blocking1.8.1? → blocking1.8.1+
I wonder if a call is missing on the stack - is there js code that holds onto a history entry, which wraps a mork row?  If the code that shutdowns history closes the mork db before the gc runs, that'll produce a crash like this.
The raw stack data in the detailed reports in talkback wasn't helpful here -- I'm not sure how talkback even got the stack that it has.  The registers show that EBP and EIP are both null, and ESP seems to me to be trashed, pointing to a chunk of memory that looks like XBL-related or JS_related data structures.

Bug 334263's appearance in talkback data is similar in some ways, except that the memory it shows as the raw stack dump does look like a stack, although like a different one.
Attached file Mac OS X crashlog
This crash occurred on shutdown when updating to a recent nightly. The stack's a little different, but I think it's the same bug.
Do you know exactly which nightly that was in?
(In reply to comment #2)
> I wonder if a call is missing on the stack - is there js code that holds onto a
> history entry, which wraps a mork row?  If the code that shutdowns history
> closes the mork db before the gc runs, that'll produce a crash like this.

So is this a bug in mork, or in the user of mork (in this case, it appears to be autocomplete)?
Component: JavaScript Engine → XTF
Yeah, sorry I didn't mention it. The stack I attached is from a 2006-08-09 nightly.
it's a bug in the user of autocomplete - you have to release your row pointers before shutting down the db that contains them.

Autocomplete uses the history db, right? (if not, the rest of this wild speculation can be ignored :-) ) So Autocomplete needs to get cleaned up before the history db is shut down. I think we've had a problem like this before...I'm not sure what service is getting free'd on ispiked's last stack trace but it might be the case that history closes the db at component shutdown time, before services are free'd. History closes the db if it gets a "profile-before-change" notification, or when the nsGlobalHistory service(?) gets deleted.
Assignee: general → nobody
Component: XTF → Autocomplete
Flags: blocking1.8.1+
Product: Core → Toolkit
QA Contact: general → autocomplete
Flags: blocking-firefox2+
Flags: blocking-firefox2+ → blocking-firefox2?
Would this appear as a shutdown crash only, or might it occur at other times?
I would say shutdown-only, unless there's a situation where a "profile-before-change" notification could be sent.
Flags: blocking-firefox2? → blocking-firefox2+
Target Milestone: --- → mozilla1.8.1
OS: Windows XP → All
Hardware: PC → All
Steps to reproduce:

1) Create new profile
2) Type some text in searchbar (Google search engine)
3) Press Enter-key (search is performed)
4) Close Firefox (File -> Exit)
5) Start Firefox
6) Click in searchbar
7) Press Down arrow key (search history appears)
8) Press ESC key
9) Close Firefox (File -> Exit)
-> Firefox crashes on exit

TB22105803Q
TB22105743K

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1b1) Gecko/20060814 BonEcho/2.0b1 ID:2006081403
I looked at the code briefly, and I'm not entirely comfortable labeling this as a bug in the user of mork.  In a garbage-collected system, you really can't depend on the ordering of destructors (and destructors certainly shouldn't trigger any user-visible behavior).  It seems like mork ought to be able to handle destructors running in any order.

That said, it may well be easier to fix this in the caller, although I'm not very familiar with the autocomplete code, and I didn't see an obvious way.
Attached patch This might or might not help (obsolete) — Splinter Review
Currently nsIMdbEnv and nsIMdbTable aren'd owned by the autocompleteresult.
So for example if nsFormHistory, which does own those, gets deleted before
the result, nsAutoCompleteMdbResult::mEnv may point to a deleted object and
if I read the code right nsCOMArray<nsIMdbRow> mResults may contain
objects which were allocated using mEnv.

Note, I haven't been able to reproduce the crash.
Comment on attachment 234188 [details] [diff] [review]
This might or might not help

Oops, wrong patch
Attachment #234188 - Attachment is obsolete: true
Attached patch this might help (obsolete) — Splinter Review
Comment on attachment 234191 [details] [diff] [review]
this might help

Bienvenu, since you know how mork works, what do you think about this?
Attachment #234191 - Flags: first-review?(bienvenu)
Comment on attachment 234191 [details] [diff] [review]
this might help

I'm not 100% sure of my previous comments - they describe the way clients used to use Mork, but I forgot that when I made nsIMdb* objects true xpcom/nsISupports objects, I got rid of uses of CloseMdbObject. But I still think you don't want any rows left in memory by the time the other objects in the db are released (in particular, the store and the table).

So this might actually help.  Since nsAutoCompleteMdbResult's weren't holding a ref to mEnv or mTable, if GlobalHistory releases them before nsAutoCompleteMdbResult is destroyed,  the underlying env or store might have gone away...

I'm building ff2, and I'm going to see if I can recreate this, and if I can get a better idea of the usual shutdown order.
OK, I built ffox2 and recreated the crash - the underlying store had been deleted. I'll see if the patch helps...
OK, history is closing the store in the onprofilechanged notification; at a much later point, js is cleaning up loaded components and the store is long gone. I can try this patch, but I rather doubt it's going to help. I suspect the autocomplete result is going to need to hold onto the store to prevent this. An other possibility is for the js component that's holding onto these results to null out the row before the store goes away though you still have ordering problems. The msg db code actually nulls out all the rows before closing the store, but this requires keeping track of all the rows - not sure if history can do that.
Attached patch proposed fixSplinter Review
This does nothing about the underlying problem, but it fixes the crash really easily. Gavin, am I correct in assuming we don't need _formHistoryResult after this point?
Attachment #234191 - Attachment is obsolete: true
Attachment #234450 - Flags: first-review?
Attachment #234191 - Flags: first-review?(bienvenu)
(In reply to comment #20)
> Gavin, am I correct in assuming we don't need _formHistoryResult after
> this point?

Yes, that patch looks correct to me.
Comment on attachment 234450 [details] [diff] [review]
proposed fix

Gavin, who should I ask a second-review of? I'm not familiar with the FF review protocol...and are you the right person to ask for a first-review?

Also, before I forget, as I suspected, the previous patch to hold onto the table and env pointers didn't fix the problem
Comment on attachment 234450 [details] [diff] [review]
proposed fix

(In reply to comment #22)
> (From update of attachment 234450 [details] [diff] [review] [edit])
> Gavin, who should I ask a second-review of? I'm not familiar with the FF review
> protocol...and are you the right person to ask for a first-review?

I guess so, though mconnor should probably stamp this too (as a browser peer).

one little nit: remove the addition of the blank line.
Attachment #234450 - Flags: second-review?(mconnor)
Attachment #234450 - Flags: first-review?
Attachment #234450 - Flags: first-review+
Assignee: nobody → bienvenu
Comment on attachment 234450 [details] [diff] [review]
proposed fix

seeking a= for 1.8.1.  (hoping mconnor will do one of his famous sr+a so that david can get this one in before ff2b2 goes out the door!)
Attachment #234450 - Flags: approval1.8.1?
Should we also null out this._formHistoryResult from the timer 'notify' callback?
(In reply to comment #25)
> Should we also null out this._formHistoryResult from the timer 'notify'
> callback?

Nevermind.  this._reset() clears this._formHistoryResult, so no worries.
Attachment #234450 - Flags: second-review?(mconnor) → second-review+
Comment on attachment 234450 [details] [diff] [review]
proposed fix

a=beltzner on behalf of drivers for the MOZILLA_1_8_BRANCH
Attachment #234450 - Flags: approval1.8.1? → approval1.8.1+
fix landed on the branch, on david's behalf.

thanks for fixing this, david.

david, do you need me to land this on the trunk for you?
Keywords: fixed1.8.1
Seth, if you could land it on the trunk that would be great. I've only got a 2.0 branch firefox tree...
> Seth, if you could land it on the trunk that would be great. I've only got a
> 2.0 branch firefox tree...

I've landed on the trunk.

and on the trunk, I've addressed gavin's nit:

> one little nit: remove the addition of the blank line.

sorry for forgetting to remove the blank line on the branch.

thanks again for the fix, david.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Crash Signature: [@ morkRowObject::CloseRowObject]
You need to log in before you can comment on or make changes to this bug.