Last Comment Bug 359821 - Firefox just crashes after about 10 minutes of use [@ nsHTMLDocument::GetElementById]
: Firefox just crashes after about 10 minutes of use [@ nsHTMLDocument::GetElem...
Status: RESOLVED FIXED
STR in comment 30. fixed on trunk by ...
: crash, fixed1.8.0.13, fixed1.8.1.5, helpwanted, qawanted, topcrash
Product: Core
Classification: Components
Component: DOM (show other bugs)
: Trunk
: x86 Windows XP
: P1 critical with 1 vote (vote)
: ---
Assigned To: Pete Warden
: Hixie (not reading bugmail)
: Andrew Overholt [:overholt]
Mentors:
unknown
: 359973 (view as bug list)
Depends on: 348156
Blocks:
  Show dependency treegraph
 
Reported: 2006-11-07 09:47 PST by Stavan Desai
Modified: 2010-09-18 01:57 PDT (History)
23 users (show)
jonas: blocking1.9+
mconnor: blocking1.8.1.1-
dveditz: blocking1.8.1.5+
dveditz: wanted1.8.1.x+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Pre-alpha extension xpi that crash can be repro-ed with (18.71 KB, application/octet-stream)
2007-04-12 11:14 PDT, Pete Warden
no flags Details
Proposed patch (1.94 KB, patch)
2007-04-14 09:01 PDT, Pete Warden
no flags Details | Diff | Splinter Review
Updated Patch (1.94 KB, patch)
2007-04-18 17:36 PDT, Pete Warden
jonas: review-
Details | Diff | Splinter Review
Proposed patch with hash adding disabled on destroyed documents (8.96 KB, patch)
2007-05-02 08:54 PDT, Pete Warden
no flags Details | Diff | Splinter Review
Updated patch, same as above but using -w to hide whitespace differences (6.14 KB, patch)
2007-05-16 19:55 PDT, Pete Warden
jonas: review+
jonas: superreview+
Details | Diff | Splinter Review
Updated patch, as above but with conforming brace style (6.25 KB, patch)
2007-06-07 00:26 PDT, Pete Warden
no flags Details | Diff | Splinter Review
Patch as above, but against the trunk rather than the 2.0.3 branch (6.72 KB, patch)
2007-06-08 08:35 PDT, Pete Warden
no flags Details | Diff | Splinter Review
Patch as 267546, but against 2.0.4 branch, with whitespace shown (8.93 KB, patch)
2007-06-21 22:14 PDT, Pete Warden
jonas: review+
dveditz: approval1.8.1.5+
dveditz: approval1.8.0.13+
Details | Diff | Splinter Review

Description Stavan Desai 2006-11-07 09:47:01 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0

It does not matter which website or when, Firefox 2.0 seems to just crash after 10 minutes of use. I am using firefox 2.0 on Windows XP service pack 2. I have used Firefox for a long time, and only now am I having this problem. All virus scans by different programs came back negative, same for spyware. Just randomly, the firefox bug reporter comes on, sends the crash information and then firefox closes. 

Reproducible: Always
Comment 1 Nick Thomas [:nthomas] 2006-11-07 09:48:17 PST
Please give a few talkback ID's so we can trace the cause of the crashes. See http://kb.mozillazine.org/Talkback for how to get them.
Comment 2 Stavan Desai 2006-11-07 11:49:20 PST
Here are the Incident IDs so far
TB25657363Q, TB25651372H, TB25650554W, TB25624849K, TB25590893G, TB25590064Z, TB25572066W, TB25564431H, TB25563660X, TB25504882Z, TB25503020X, TB25481533Q, TB25480652M, TB25479335Z, TB25428351Y, TB25380519W
Comment 3 Nick Thomas [:nthomas] 2006-11-07 12:16:59 PST
Looks like the crashes all occur at 
  http://www.gamespot.com/xbox360
I couldn't reproduce the crash surfing around a bit and leaving it sitting for 15 minutes or so (using Firefox 2 w/ blank profile, Flash 8.0 r24, win32)

The stacks end at a random memory address, or at nsHTMLDocument::GetElementById - about a 50/50 split, with 3 other stacks that are empty. We need someone who knows stack-fu to say more.
Comment 4 Adam Guthrie 2006-11-07 13:27:58 PST
Are you using any extensions? If so, do you still crash in safe mode (http://kb.mozillazine.org/Safe_mode).

Incident ID: 25480652
Stack Signature	nsHTMLDocument::GetElementById cb202024
Product ID	Firefox2
Build ID	2006101023
Trigger Time	2006-11-03 20:15:07.0
Platform	Win32
Operating System	Windows NT 5.1 build 2600
Module	firefox.exe + (00196e33)
URL visited	http://www.gamespot.com/xbox360
User Comments	
Since Last Crash	182 sec
Total Uptime	75242 sec
Trigger Reason	Access violation
Source File, Line No.	c:/builds/tinderbox/Fx-Mozilla1.8-release/WINNT_5.2_Depend/mozilla/content/html/document/src/nsHTMLDocument.cpp, line 2479
Stack Trace 	
nsHTMLDocument::GetElementById  [mozilla/content/html/document/src/nsHTMLDocument.cpp, line 2479]
XPTC_InvokeByIndex  [mozilla/xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp, line 102]
XPCWrappedNative::CallMethod  [mozilla/js/src/xpconnect/src/xpcwrappednative.cpp, line 2169]
XPC_WN_CallMethod  [mozilla/js/src/xpconnect/src/xpcwrappednativejsops.cpp, line 1455]
js_Invoke  [mozilla/js/src/jsinterp.c, line 1377]
js_InternalInvoke  [mozilla/js/src/jsinterp.c, line 1471]
JS_CallFunctionValue  [mozilla/js/src/jsapi.c, line 4419]
XPC_NW_FunctionWrapper  [mozilla/js/src/xpconnect/src/XPCNativeWrapper.cpp, line 387]
js_Invoke  [mozilla/js/src/jsinterp.c, line 1377]
js_Interpret  [mozilla/js/src/jsinterp.c, line 4121]
js_Invoke  [mozilla/js/src/jsinterp.c, line 1396]
js_InternalInvoke  [mozilla/js/src/jsinterp.c, line 1471]
JS_CallFunctionValue  [mozilla/js/src/jsapi.c, line 4419]
nsJSContext::CallEventHandler  [mozilla/dom/src/base/nsJSEnvironment.cpp, line 1493]
nsGlobalWindow::RunTimeout  [mozilla/dom/src/base/nsGlobalWindow.cpp, line 6714]
nsGlobalWindow::TimerCallback  [mozilla/dom/src/base/nsGlobalWindow.cpp, line 7086]
nsAppStartup::Run  [mozilla/toolkit/components/startup/src/nsAppStartup.cpp, line 152]
main  [mozilla/browser/app/nsBrowserApp.cpp, line 61]
kernel32.dll + 0x16fd7 (0x7c816fd7)
Comment 5 Stavan Desai 2006-11-07 14:17:37 PST
I am using extensions, but I have used them all for a long time, and it still does crash in safemode. Also, someone mentioned that all the crashes were at www.gamespot.com/xbox360. I have had crashes at other sites as well, such as my.yahoo.com, www.cnn.com etc. Sometimes, the talkback feedback agent doesn't come up, just the windows send error report. 
Comment 6 Olli Pettay [:smaug] 2006-11-07 14:22:02 PST
This is #67 FF2 crasher.
Comment 7 Olli Pettay [:smaug] 2006-11-07 14:30:59 PST
Boris, you may remember something about getelementbyid ;)
Comment 8 Boris Zbarsky [:bz] (still a bit busy) 2006-11-07 16:29:33 PST
That line is a CallQI call on something that's guaranteed to not be null.  Hence I conclude that it's dead.  Possibly the document is dead as well, though I'd expect that to crash sooner.

As for how that would happen... no idea.  This code is pretty different on the trunk.
Comment 9 Adam Guthrie 2006-11-08 14:36:56 PST
*** Bug 359973 has been marked as a duplicate of this bug. ***
Comment 10 Marius Strumyla 2006-11-09 08:59:43 PST
Another thing to check is to try with a clean profile.
Comment 11 Daniel Veditz [:dveditz] 2006-11-10 17:03:57 PST
Would love to see a fix, moved up to #23 on the topcrash list.

Jonas: any thoughts on who to assign this to if not you?
Comment 12 Laurabean113 2006-11-11 14:45:00 PST
I just installed Firefox 2.0 on my laptop and it is now crashing as well.  It crashed 2 times within the first 10 minutes of having 2.0 installed. I am using the default theme.
Comment 13 Robert Dominy 2006-11-17 11:42:32 PST
We are also seeing random, somewhat frequent exceptions being thrown by getElementById in our web application.  We haven't been able to boil it down to a simple test case, but would love to see this one get fixed.  Our web application also interacts with Flash on the page.
Comment 14 Boris Zbarsky [:bz] (still a bit busy) 2006-11-17 13:05:41 PST
Robert, I assume you're seeing these in 2.0 and not 1.5?  If so, would you possibly be willing to try some of the intermediate builds and see whether you can narrow down when the problem started?  Or give someone access to your web app?
Comment 15 Robert Dominy 2006-11-17 14:03:34 PST
Yes we are seeing the crashes in the 2.0 release.  Which intermediate releases would you suggest we test against?
Comment 16 Boris Zbarsky [:bz] (still a bit busy) 2006-11-17 14:28:40 PST
Robert, see http://archive.mozilla.org/pub/firefox/nightly/ -- it might take a bit to load the first time, sadly.  :(

The directories you're looking for have names like "2006-09-15-04-mozilla1.8/" -- that would be a build from the 1.8 Gecko branch (which Firefox 2 is based on), compiled on September 15, 2006, at 4am Pacific time.  There will generally be 2-3 directories like that per day (Linux, Windows, Mac builds).  These aren't releases, but development snapshots, but they should be pretty stable in general.

The earliest 1.8 branch build (pretty identical to Firefox 1.5) looks like 2005-08-09-12-mozilla1.8/  and the most recent one on archive right now is 2006-10-16-05-mozilla1.8/ -- more recent builds would be at <ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/>.

I've had good luck in the past with a binary search on the builds in cases like this.  For about a year's worth of builds, as in this case, only about 8-9 need to be tested...
Comment 17 Stavan Desai 2006-11-19 18:32:18 PST
I am the original poster of this bug. The problem used to occur mostly at www.gamespot.com/xbox360. However, recently, it has been happening more often at different locations, such as www.engadget.com, my.yahoo.com etc. Also, I have never done this before. Can I actually expect this bug to get fixed?
Comment 18 Boris Zbarsky [:bz] (still a bit busy) 2006-11-19 18:42:50 PST
As soon as someone with a debugger manages to reproduce it, probably...
Comment 19 Robert Dominy 2006-11-22 12:59:26 PST
Our QA group went back through the nightly builds and have narrowed the bug to beginning on the 7/07 build.  Builds prior to that do not exhibit the bug and ones   from that build onward exhibit the bug.  Our testers report that it seems more reproducible with Flash 8 installed (and using Flash on the page).

I can't be sure the bug we are seeing is identical to the original reported bug, so it may be helpful if some corroborates the build dates.
Comment 20 Nick Thomas [:nthomas] 2006-11-22 13:31:23 PST
Assuming the date is good, the checkins between the 06/July and 07/July Windows nightly are listed at
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=AviarySuiteBranchTinderbox&branch=MOZILLA_1_8_BRANCH&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=1152181140&maxdate=1152272400&cvsroot=%2Fcvsroot
which includes the changes to nsHTMLDocument from bug 333038 and the JS 1.7 landing.

Is it possible to get useful Talkback reports for a 07/July build ? Or are the symbols long gone ...

Comment 21 Boris Zbarsky [:bz] (still a bit busy) 2006-11-24 12:48:35 PST
Jay, see comment 20?

Stavan Desai, can you confirm the regression range?
Comment 22 Jay Patel [:jay] 2006-11-25 12:08:10 PST
cf:  We do have good data for crashes submitted when we *did* have symbols, so there are plenty of crashes stored in the DB from 7/7 builds:
http://talkback-public.mozilla.org/search/start.jsp?search=1&searchby=stacksig&match=contains&searchfor=&vendor=MozillaOrg&product=Firefox2&platform=All&buildid=20060707&sdate=&stime=&edate=&etime=&sortby=stack&rlimit=0
[
Comment 23 Boris Zbarsky [:bz] (still a bit busy) 2006-11-26 11:48:02 PST
Robert, could you paste in links to the first build you guys see issues with and the last one you don't, just to make sure we're all on the same page?

Also, does your application use designmode iframes?
Comment 24 Robert Dominy 2006-11-27 12:53:32 PST
Here are the file links we tracked the bug we're seeing to.

The bug begins occuring in this build:
 http://archive.mozilla.org/pub/firefox/nightly/2006-07-07-04-mozilla1.8/
 The file link:  firefox-2.0a3.en-US.win32.installer.exe 
 
The bug does NOT occur in this build:
 http://archive.mozilla.org/pub/firefox/nightly/2006-07-06-03-mozilla1.8/
 The file link:  firefox-2.0a3.en-US.win32.installer.exe  

Our application does not use designmode iframes.  It does use regular iframes and embedded Flash object.s
Comment 25 Boris Zbarsky [:bz] (still a bit busy) 2006-11-27 13:18:18 PST
OK.  If designmode is not being used, then bug 333038 is not too likely to be the problem.

Robert, I assume your stacks look like those cited in comment 2?

Looking at those stacks again, what's happening is that chrome JS is calling getElementById on a content document off a timeout (note the XPC_NW_FunctionWrapper on the stack).  Given the regression range, this makes bug 189039 a possible suspect... at the very least it's a change to chrome code that does in fact make getElementById calls (not sure about the timeout part).

Stavan, Robert, are you using typeahead find on pages with textboxes?

What would be ideal here would be someone catching this crash in a debugger and looking up the JS callstack... (e.g. by calling the DumpJSStack() function in the debugger).  That would certainly help with sorting out how we're getting to the crash site...
Comment 26 Olli Pettay [:smaug] 2006-11-30 04:48:19 PST
Just wondering if this can do something wrong:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/toolkit/components/typeaheadfind/src/nsTypeAheadFind.cpp&rev=&cvsroot=/cvsroot&mark=496,497#496
I'd write that using nsCOMPtr and .swap

Still reading bug 189039...
Comment 27 Boris Zbarsky [:bz] (still a bit busy) 2006-11-30 09:43:17 PST
That line could do the wrong thing if we're holding the only reference to the node, for sure.  But then we'd crash at this point...
Comment 28 Adam Guthrie 2007-02-24 13:00:48 PST
This crash has jumped up to the #14 topcrash for Firefox 2.0.0.1. I don't know if we're willing to reconsider blocking status based on that information.
Comment 29 John Ostrander 2007-04-06 06:15:58 PDT
(In reply to comment #28)
> This crash has jumped up to the #14 topcrash for Firefox 2.0.0.1. I don't know
> if we're willing to reconsider blocking status based on that information.
> 

Comment 30 Pete Warden 2007-04-12 09:42:08 PDT
I've been running into an issue that looks very similar with an extension I'm developing. It's crashing immediately below the CallQueryInterface function in GetElementById(). I've run it with my own build of BonEchoDebug, and MallocDebug with scribbling on free turned on, and I see the pointer to the element that's in in the e variable points to an area filled with 0x55, which means it's been freed.

I did also notice that mIsGoingAway always seems to be true in the document when I get this crash.

I can reproduce this with the following steps:
- Install my extension, which listens for google search result loads, and calls XMLHttpRequest's on the result links to determine if they exist, and have the search terms in their text
- Load a page of search results in google, and rapidly click next while there's a lot of requests in flight
- After two or three pages, I usually crash with a callstack that looks just like the one in comment #4. It dies right when I click the next button

It dies in my onreadystatechange callback from my logging, right after I've added an element to the current document by appending to another element's innerHTML. I make a call to retrieve the element I just added with that ID, and it dies in there.

I will attach my extension code if anyone else wants to help debug this. My next steps are going to be adding logging to the getElementById() code to see where the pointer to the element is being retrieved from (eg the hash map?), log the closing of the document, and try to trace the life of the element I add so I can see why it's getting freed. Oh, and I'm testing on OS X if that makes any difference. 
Comment 31 François Gagné 2007-04-12 10:19:57 PDT
Also attach the .xpi so someone could try it on Win32/Linux ?
Comment 32 Pete Warden 2007-04-12 11:14:06 PDT
Created attachment 261402 [details]
Pre-alpha extension xpi that crash can be repro-ed with

This is the extension xpi I mention in my comment, that's needed to reproduce the crash I'm seeing.

You can look at the source code by unzipping the xpi, and then unzipping the content.jar that's inside the xpi. The crashing function is sendPageRequest in petesearch.js

I've included heavy logging for that function, to narrow down where the crash occurs, using dump(). Logging indicates that the crash occurs inside the getPreviewElement(), which looks like this:

function getPreviewElement(document, target)
{
	return document.getElementById(getPreviewIDFromTarget(target));
}
Comment 33 François Gagné 2007-04-12 11:33:22 PDT
Pete, have you tryed to reproduce in _stock_ 2.0.0.3 or the new 1.9 nightly builds?
Comment 34 Olli Pettay [:smaug] 2007-04-12 12:01:30 PDT
Sounds like htmldocument doesn't get notified that element is removed 
from document. Should we remove content objects in ::Destroy (not just unbind) to get notifications or just clear mIdAndNameHashTable there.
Comment 35 Pete Warden 2007-04-12 12:35:31 PDT
(In reply to comment #33)
> Pete, have you tryed to reproduce in _stock_ 2.0.0.3 or the new 1.9 nightly
> builds?

Yes, it was in stock 2.0.3 that I saw this. I only built my own version to try and get some more debug information. I haven't tried it on top-of-tree yet.


Comment 36 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-04-13 02:04:03 PDT
I don't think we want to remove the content objects in ::Destroy on branch, that's too big of a change. Instead lets just clear the ID cache and use nsContentUtils::MatchElementId
Comment 37 Pete Warden 2007-04-13 09:40:15 PDT
To test the theory that this is caused by the hash map holding on to destroyed pointers, I added some logging. I had nsDocument::Destroy() log when it was called for each document, and made the following changes to nsHTMLDocument::GetElementById:

NS_IMETHODIMP
nsHTMLDocument::GetElementById(const nsAString& aElementId,
                               nsIDOMElement** aReturn)
{
...
fprintf(stderr, "GetElementById called on document %8x\n", this);
...
  if (e == ID_NOT_IN_DOCUMENT) {
...
fprintf(stderr, "GetElementById:: 0x%8x found in hash after flush\n",e);
  }
else
{
fprintf(stderr, "GetElementById:: 0x%8x found in hash before flush\n",e);
}
...
fprintf(stderr, "GetElementById:: 0x%8x found, but not in hash\n",e);
}


Logging output just before the crash:

Document 0x1d40ee00 destroyed
<... other unrelated log statements ...>
GetElementById called on document 1d40ee00
GetElementById:: 0x3b8aec10 found in hash before flush

This seems to support the idea that the hash map is the cause.

It seems like I could further test this by calling nsHTMLDocument::InvalidateHashTables(). This isn't accessible from the nsDocument base class though, should I create a derived version of Destroy() for nsHTMLDocument that calls nsDocument::Destroy() and then calls InvalidateHashTables(), or is there some other mechanism I could use to make sure it happens at the right time?
Comment 38 Pete Warden 2007-04-14 09:01:21 PDT
Created attachment 261555 [details] [diff] [review]
Proposed patch

Here's a proposed fix. I've added an implementation of Destroy to nsHTMLDocument, that overrides the original virtual nsDocument implementation.
void
nsHTMLDocument::Destroy()
{
  nsDocument::Destroy();
  InvalidateHashTables();
}

It calls back to the the nsDocument Destroy, and then invalidates the hash as an additional step. I've tested this, and I'm no longer able to reproduce the crash with the steps that caused it before.

This patch is against BonEcho. I'm a mozilla newbie, going from http://www.mozilla.org/hacking/life-cycle.html it sounds like I should really do this against top-of-tree, and then maybe it could get backported if necessary for a 2.0.x release?
Comment 39 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-04-14 17:34:01 PDT
Please swap the calls and invalidate the hashes first. This is because we're still in a bad state after the Destroy call, so we should do as little as possible then.
Comment 40 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-04-17 17:32:05 PDT
Lets block on this since it's a topcrasher. 
Comment 41 Pete Warden 2007-04-18 17:36:06 PDT
Created attachment 262060 [details] [diff] [review]
Updated Patch

Thanks Jonas, that makes sense. I originally placed it last in case anything in the base destroy added objects to the hash, but looking through the code it's pretty clear nothing does or should.

Here's the updated patch, the only change from the previous one is the move of the  invalidate call. I've tested it locally with my repro steps, and it no longer crashes. I've only tested on BonEcho, I will add a comment once I've built and run with it on the latest trunk. I assumed that won't block review?
Comment 42 Pete Warden 2007-04-21 12:59:43 PDT
I've now tested with the latest Mac nightly (21st April). I can still repro the crash in the nightly builds, though the crash location seems to vary more. I've seen it in nsHTMLDocument::getAttribute() for example.

I've also built my own version of Minefield, with my patch applied, and I no longer see the crash when following the repro steps.

The callstack for the nightly crashes starts from XMLHttpRequest, so I believe it's the same underlying issue (a freed object pulled from the hash table and used).

Based on this, it seems like the fix is still needed, and should go into 1.9 as well as 1.8.x
Comment 43 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-04-25 22:22:50 PDT
Comment on attachment 262060 [details] [diff] [review]
Updated Patch

You should also set some flag indicating that this doc has been destroyed, and then make the implementation of getElementById not put new things into the hash if that flag is set.
Comment 44 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-04-25 22:23:23 PDT
Sorry for not mentioning that earlier :(
Comment 45 Pete Warden 2007-04-26 08:47:02 PDT
(In reply to comment #44)
> Sorry for not mentioning that earlier :(

np :)

I'll add some code using the existing NSDocument::mIsGoingAway flag in getElementById(). That only gets set in NSDocument::destroy(), and looks like it's used for similar "do something different when the document's destroyed" behaviors in a couple of other functions. Sound reasonable?

Once I've implemented and tested that change, I'll put up a new patch.
Comment 46 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-04-26 16:19:47 PDT
Sounds great.
Comment 47 Pete Warden 2007-05-02 08:54:05 PDT
Created attachment 263474 [details] [diff] [review]
Proposed patch with hash adding disabled on destroyed documents

Here's a patch with all of the places where an object might be added to the hash table guarded with checks on mIsGoingAway.

I had to touch several functions in nsHtmlDocument, in addition to GetElementById.  I searched the code for any use of mIdAndNameHashTable with the PL_DHASH_ADD operation to decide where to put the check.

I made AddToIdTable, UpdateIdTableEntry and ResolveName return early without doing anything if the document was destroyed. 

The one function I didn't touch was ReserveNameInHash, since this is only used in the init and resetURI methods, and the entries it creates are not normal elements at risk of being deallocated.

The changes I made to GetElementById were more involved than I'd like, because the path we want to take (calling MatchElementId and then CallQueryInterface) is at the bottom of the function, and interleaved with hash table accesses that I had to guard.

An alternative for GetElementById is to duplicate that code in an early returning block at the top of the function, so we don't have to guard all the subsequent hash code. This has the disadvantage of making it harder to maintain and keep the two duplicate code blocks in sync going forward. My current approach seems lower risk and more maintainable. A third alternative would be to return null from GetElementById when the document had been destroyed, but this seems too big a change in behavior.

I've tested this against my original steps, and it does fix the problem. It does slightly change the behavior of some of the document functions after destruction, so it would be good to see it tested against some other DOM intensive pages.
Comment 48 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-05-03 14:33:21 PDT
Reassigning since Pete is the one actually working on this.
Comment 49 Pete Warden 2007-05-10 08:04:21 PDT
As a note for anyone looking for a workaround until we get this in a release, I switched my extension over to retrieving the element by name rather than ID. Since this doesn't use the hash table, it appears to avoid the crash and work on destroyed documents.

The released plugin is up at http://petesearch.com/
Comment 50 Pete Warden 2007-05-16 19:55:53 PDT
Created attachment 265080 [details] [diff] [review]
Updated patch, same as above but using -w to hide whitespace differences

After chatting to Jonas, I've reformatted the patch to hide the large number of whitespace differences caused by the indenting, to make it easier to understand, and put it back in for review.
Comment 51 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-05-17 15:15:58 PDT
Comment on attachment 265080 [details] [diff] [review]
Updated patch, same as above but using -w to hide whitespace differences

>@@ -2503,34 +2518,38 @@ nsHTMLDocument::GetElementById(const nsA
>   if (e == ID_NOT_IN_DOCUMENT) {
>     // We've looked for this id before and we didn't find it, so it
>     // won't be in the document now either (since the
>     // mIdAndNameHashTable is live for entries in the table)
> 
>     return NS_OK;
>   }
> 
>+  }

No need for the empty line between the two '}' lines.

>       // the fact that it's not in the document
>+      if (entry)
>       entry->mIdContent = ID_NOT_IN_DOCUMENT;
> 
>       return NS_OK;
>     }
> 
>     // We found an element with a matching id, store that in the hash
>+    if (entry)
>     entry->mIdContent = e;
>   }

Please use { } around the two above new if-statements. 


> nsresult
> nsHTMLDocument::ResolveName(const nsAString& aName,
>                             nsIDOMHTMLFormElement *aForm,
>                             nsISupports **aResult)
> {
>   *aResult = nsnull;
> 
>-  if (IsXHTML()) {
>+  if (IsXHTML()||mIsGoingAway) {

Put spaces around the ||.

r/sr=me with that fixed.
Comment 52 Olli Pettay [:smaug] 2007-06-06 00:11:05 PDT
Do we need a separate patch for branch or does the patch work there too?
This is still a 2.0.0.x topcrasher.
Though, first the patch needs to land on trunk.
Pete, could you update the patch, I can then check it in.
Comment 53 Pete Warden 2007-06-07 00:26:15 PDT
Created attachment 267546 [details] [diff] [review]
Updated patch, as above but with conforming brace style

Here's the cleaned up patch, diffed against the 2.0.3 branch. I'll update and build against the main trunk, test it there, and then post another patch.
Comment 54 Pete Warden 2007-06-08 08:35:38 PDT
Created attachment 267717 [details] [diff] [review]
Patch as above, but against the trunk rather than the 2.0.3 branch

Here's the diff against the trunk. The internals of HTMLDocument have changed enough that I had to do a pretty manual merge, so it might be easier to use the previous patch for the 2.0.x branch when it needs to be added there.

I've tested it, and I got the crash with the attached xpi and described steps without this change on last night's trunk, and couldn't repro it after my changes.

I do wonder if we shouldn't change the hash so that it takes ownership of the objects added to it as a longer-term fix? Anyway, this patch is a solution for the immediate crash at least.
Comment 55 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-06-08 10:04:23 PDT
Please hold off landing this on trunk as I'm working on a different fix there. Please do land on branch asap though (after getting approval of course).
Comment 56 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-06-18 15:51:46 PDT
This should be fixed on trunk with the patch from bug 348156
Comment 57 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-06-18 17:06:04 PDT
Reopening since bug 348156 was backed out.
Comment 58 Pete Warden 2007-06-21 22:14:24 PDT
Created attachment 269341 [details] [diff] [review]
Patch as 267546, but against 2.0.4 branch, with whitespace shown

Here's the patch against FIREFOX_2_0_0_4_RELEASE
I've included whitespace differences, as per Olli's request, and tested locally.
Comment 59 Pete Warden 2007-06-21 22:16:26 PDT
Comment on attachment 269341 [details] [diff] [review]
Patch as 267546, but against 2.0.4 branch, with whitespace shown

Requested approval on this patch.
Comment 60 Daniel Veditz [:dveditz] 2007-06-22 11:03:07 PDT
Comment on attachment 269341 [details] [diff] [review]
Patch as 267546, but against 2.0.4 branch, with whitespace shown

Jonas, would you mind a brief re-review here to make sure the branch patch isn't going to stumble on some trunk/branch difference?
Comment 61 Daniel Veditz [:dveditz] 2007-06-22 23:09:09 PDT
Comment on attachment 269341 [details] [diff] [review]
Patch as 267546, but against 2.0.4 branch, with whitespace shown

approved for 1.8.1.5 and 1.8.0.13, a=dveditz
Comment 62 Olli Pettay [:smaug] 2007-06-25 02:12:02 PDT
attachment 269341 [details] [diff] [review] checked in to branches.



Comment 63 Martijn Wargers [:mwargers] (not working for Mozilla) 2007-10-09 13:33:29 PDT
This is now a trunk only bug, afaict, so changing the Version field to Trunk.
Comment 64 Mike Schroepfer 2007-11-14 12:27:05 PST
Is there an updated patch for trunk needed here?
Comment 65 Olli Pettay [:smaug] 2007-11-14 12:40:31 PST
No. Bug 348156 should fix this one too.
But better to test after that has landed.
Comment 66 Jonas Sicking (:sicking) No longer reading bugmail consistently 2007-11-30 15:47:19 PST
Should be fixed by patch in bug 348156

Note You need to log in before you can comment on or make changes to this bug.